NONINFORMATIVE  PRIORS,  CREDIBLE  SETS,  AND 
BAYESIAN  HYPOTHESIS  TESTING 


By 

JUNGEUN  HEO 


A DISSERTATION  PRESENTED  TO  THE  GRADUATE  SCHOOL 
OF  THE  UNIVERSITY  OF  FLORIDA  IN  PARTIAL  FULFILLMENT 
OF  THE  REQUIREMENTS  FOR  THE  DEGREE  OF 
DOCTOR  OF  PHILOSOPHY 

UNIVERSITY  OF  FLORIDA 


2000 


Copyright  2000 
by 

Jungeun  Heo 


to  my  family,  with  regards 


ACKNOWLEDGMENTS 


I would  like  to  acknowledge,  thank,  and  bless  my  advisor,  Malay  Ghosh,  for  the 
generous  gift  of  his  time,  and  his  good  sense  of  direction.  Without  his  unbounded 
kindness  and  his  encouragement,  this  work  would  never  have  been  completed.  I thank 
Dr.  Ramon  Littell,  Dr.  Andrew  Rosalsky,  Dr.  James  Robert,  and  Dr.  Irene  Hueter 
for  serving  on  my  dissertation  committee.  I thank  Dr.  Ronald  Randles,  Chairman 
of  the  Department  of  Statistics,  for  supporting  and  encouraging  me  throughout  my 
years  at  the  University  of  Florida.  I will  remember  his  kindness  and  beautiful  smile 
always. 

I thank  my  mother,  Sunnan  Kim,  for  instilling  in  me  the  philosophy,  “Nothing 
ventured,  nothing  gained.”  Her  love  and  her  tremendous  belief  in  me  fostered  eventual 
belief  in  myself.  I thank  my  grandmother,  sisters,  and  brothers-in-law  for  their  love 
and  encouragement.  I thank  my  two  nieces,  Suyun  and  Sujin,  for  being  a glorious 
joy  to  me.  I thank  my  fiancee,  Byungyun  Joung,  for  his  love,  his  endless  care,  and 
his  prayers  that  are  a source  of  inspiration. 

I wish  to  express  my  special  thanks  to  Cynthia  Garvan  who  has  supported  me 
during  my  work  in  the  Division  of  Biostatistics.  I would  like  to  thank  all  my  colleagues 
and  friends  for  their  assistance  and  continuous  prayers. 


IV 


TABLE  OF  CONTENTS 


ACKNOWLEDGMENTS iv 

LIST  OF  TABLES vii 

ABSTRACT  viii 

CHAPTERS 

1 INTRODUCTION  1 

1.1  A Bayesian  Primer  1 

1.2  Literature  Review 5 

1.3  The  Subject  of  This  Dissertation 10 

2 INTRACLASS  MODELS 13 

2.1  Introduction 13 

2.2  Noninformative  Priors 15 

2.2.1  Fisher  Information  Matrix 15 

2.2.2  Quantile  Matching  Priors 17 

2.2.3  Matching  Based  on  Highest  Posterior  Density  (HPD) 

Regions 19 

2.2.4  Matching  Based  on  Inversion  of  Likelihood  Ratio  Statis- 

tics   20 

2.2.5  Reference  Priors 20 

2.3  Propriety  of  the  Posterior  Distributions 21 

2.4  Computer  Simulation 24 

2.4.1  Method 24 

2.4.2  Results 25 

2.5  Divergence  Measures  26 

2.6  Concluding  Remarks  31 

3 FIRST  ORDER  AUTOREGRESSIVE  MODELS  32 

3.1  Introduction 32 

3.2  Development  of  Noninformative  Priors 34 

3.2.1  Fisher  Information  Matrix 34 

3.2.2  Reference  Priors 36 

3.2.3  Matching  Priors 36 


V 


3.2.4  Matching  Based  on  Highest  Posterior  Density  (HPD) 

Regions 39 

3.3  Propriety  of  the  Posterior  Distributions 40 

3.4  Divergence  Measures  44 

3.5  Concluding  Remarks  46 

4 FAMILIAL  DATA  MODELS 47 

4.1  Introduction 47 

4.2  Development  of  Noninformative  Priors 48 

4.2.1  Fisher  Information  Matrix 48 

4.2.2  Reference  Priors 51 

4.2.3  Probability  Matching  Priors 52 

4.3  Propriety  of  the  Posterior  Distributions 54 

4.4  Further  Priors 59 

4.5  Simulation  Study 62 

4.5.1  Method 62 

4.5.2  Results 64 

4.6  Concluding  Remarks  66 

5 SUMMARY  AND  FUTURE  RESEARCH  67 

5.1  Summary 67 

5.2  Ideas  for  Future  Research 69 

REFERENCES 71 

BIOGRAPHICAL  SKETCH 75 


VI 


LIST  OF  TABLES 


Table  page 

2.1  Estimated  Tail  Probabilities  of  Posterior  Distributions  under  the  One- 

at-a-time  Reference  Prior  (ttr)  and  Jeffreys’  Prior  (ttj)  and  Different 
Sample  Sizes 26 

2.2  Credible  Intervals  for  p Based  on  Different  Procedures 30 

3.1  Estimated  Tail  Probabilities  of  Posterior  Distributions  under  the  Two 

Group  Reference  Prior  {'Kr2)^  Three  Group  Reference  Prior  (ttrs)  and 
Jeffreys’  Prior  (ttj)  and  Different  Sample  Sizes 43 

3.2  Credible  Intervals  for  p Based  on  Different  Procedures 46 

4.1  Estimated  Frequentist  Coverage  Probability  of  the  Posterior  Tail  Proba- 

bilities of  Each  Component  of  the  Parameter  Vector  0i,  When  Pms=0.1 
and  =0.9 64 

4.2  Estimated  Frequentist  Coverage  Probability  of  the  Posterior  Tail  Proba- 

bilities of  Each  Component  of  the  Parameter  Vector  0i,  When  pms=0.3 
and  Pss=0.7 65 

4.3  Estimated  Frequentist  Coverage  Probability  of  the  Posterior  Tail  Proba- 

bilities of  Each  Component  of  the  Parameter  Vector  0i,  When  Pms=0.5 
and  pss=0.5 65 


vii 


Abstract  of  Dissertation  Presented  to  the  Graduate  School 
of  the  University  of  Florida  in  Partial  Fulfillment  of  the 
Requirements  for  the  Degree  of  Doctor  of  Philosophy 


NONINFORMATIVE  PRIORS,  CREDIBLE  SETS,  AND 
BAYESIAN  HYPOTHESIS  TESTING 


By 

Jungeun  Heo 
August  2000 

Chairman:  Malay  Ghosh 
Major  Department:  Statistics 

The  objective  of  my  present  work  is  to  provide  a Bayesian  analysis  of  selected 
models  (an  intraclass  model,  an  autoregressive  model,  and  a familial  data  model) 
based  on  noninformative  priors.  Probability  matching  priors  and  reference  priors 
along  with  Jeffreys’  prior  are  considered.  The  propriety  of  posteriors  under  these 
priors  is  investigated  for  the  various  models.  These  priors  are  compared  in  light  of 
how  accurately  the  coverage  probabilities  of  Bayesian  credible  intervals  match  the 
corresponding  frequentist  coverage  probabilities. 


viii 


CHAPTER  1 
INTRODUCTION 


This  chapter  contains  a general  introduction  to  the  Bayesian  method,  a litera- 
ture review  of  pertinent  articles  and  books  related  to  noninformative  priors,  and  an 
overview  of  the  dissertation. 


1.1  A Bayesian  Primer 


Bayesian  analysis  is  perhaps  best  explained  by  contrast  to  frequentist  (or  classical) 
statistical  analysis.  In  both  approaches,  we  may  let  9 (possibly  a vector  or  matrix) 
represent  the  state  of  nature  in  question  and  let  0 represent  all  possible  states  of 
nature.  In  the  frequentist  paradigm,  we  consider  9 to  be  fixed  and  unknown.  In 
the  Bayesian  paradigm,  9 possesses  a distribution  that  quantifies  prior  belief  (or  past 
experience)  about  how  likely  9 is  to  assume  a value  in  0.  We  call  this  the  prior 
distribution.  In  a frequentist  framework,  inference  about  9 is  based  only  on  the 
use  of  sample  information.  Essentially,  Bayesian  analysis  is  performed  by  combining 
the  prior  information  and  the  sample  information  into  what  is  called  the  posterior 
distribution  of  9,  given  the  data. 

To  make  the  case  for  incorporating  prior  information,  Berger  (1985)  cites  Savage’s 
(1961)  compelling  examples  of  the  possible  importance  of  prior  information.  Consider 
the  following  statistical  experiments: 

• A lady  who  adds  milk  to  her  tea  claims  to  be  able  to  tell  whether  the  tea  or  the 
milk  was  poured  into  the  cup  first.  In  ten  trials  conducted  to  test  this,  she  det- 


1 


2 


ermines  correctly  each  time. 

• A music  expert  claims  to  be  able  to  distinguish  a page  of  Haydn  score  from  a 
page  of  Mozart  score.  In  ten  trials  conducted  to  test  this,  he  is  correct  each 
time. 

• A drunken  friend  says  he  can  predict  the  outcome  of  a flip  of  a fair  coin.  In  ten 
trials  conducted  to  test  this,  he  makes  a correct  determination  each  time. 

In  all  three  situations,  a frequentist  analysis  gives  strong  evidence  that  the  claims  are 
valid.  That  is,  a frequentist  analysis  would  be  identical  in  all  cases.  No  consideration 
would  be  given  to  the  knowledge  that  it  is  an  expert  distinguishing  the  music  scores, 
that  is  a drunk  predicting  the  coin  toss,  or  as  in  the  first  case,  that  the  prior  infor- 
mation is  vague.  But,  in  these  three  statistical  situations,  prior  information  cannot 
definitely  be  ignored. 

To  illustrate  Bayesian  inference  we  consider  a concrete  example  given  in  Berger 
(1985).  Suppose  a child  is  given  an  intelligence  test,  and  the  test  score  Y is  N{6, 100), 
where  6 is  the  true  IQ  (intelligence  level)  of  the  child.  In  other  words,  if  the  child 
were  to  take  a large  number  of  independent  similar  tests,  his  average  score  would  be 
about  9.  Assume  also  that,  in  the  population  as  a whole,  9 is  distributed  according 
to  A(100, 225).  Let  the  prior  of  9 be  denoted  by  tt{9).  Then  in  this  example 

ir{9)  = iV(100,225). 

The  distribution  of  the  random  variable,  Y,  conditional  on  9 is  denoted  by  f(y\9). 
For  this  example 

f{y\9)  = N{9, 100). 

The  posterior  distribution  of  9 given  y is  denoted  by  Tr{9\y),  and  as  the  notation 
indicates,  is  defined  as  the  conditional  distribution  of  9 given  the  sample  observation 


3 


y.  According  to  Bayes’  rule,  the  posterior  is  obtained  as 


Ti{6\y)  oc  'n{e)f{y\e), 


where  the  constant  of  proportionality  depends  only  on  data. 

The  name  “posterior  distribution”  is  indicative  of  the  role  of  ■n{6\y).  Just  as  the 
prior  distribution  reflects  beliefs  about  6 prior  to  experimentation,  'K{6\y)  reflects  the 
updated  beliefs  about  9 after  (posterior  to)  observing  the  sample  y.  In  other  words, 
the  posterior  distribution  combines  prior  beliefs  about  9 with  the  information  about 
9 contained  in  sample  y.  Now,  return  to  the  IQ  example  and  suppose  a child  scores 
115  on  the  IQ  test.  The  posterior  distribution  of  his  IQ  score  is  found  as 

1 


Tx{9\y)  oc 


y27r(225) 


exp 


r (9  - 100)^  1 

[ 1 J 

1 2(225)  J 

1 ^27t(100)  I 

1 2(100)  J 

On  algebraic  manipulations  we  observe  that  7r(0|y)  is  a normal  density  with  a mean 
of  110.39  and  a variance  of  69.23. 

Bayesian  inference  is  often  reported  in  terms  of  credible  sets  which  are  an  analog 
of  classical  confldence  sets.  A 100(1  — a)%  credible  set  for  ^ is  a subset  C of  © such 


that 

I — a < f n{9\y)d9. 

Jc 

Since  the  posterior  distribution  is  an  actual  probability  distribution  for  0,  we  can 
speak  of  the  probability  that  0 is  in  C.  This  is  in  contrast  to  classical  confidence 
intervals  which  can  be  interpreted  in  terms  of  coverage  probability  (the  probability 
that  random  Y is  such  that  the  confidence  set  C{Y)  contains  9).  In  choosing  a 
credible  set  for  9,  we  usually  try  to  minimize  its  size.  To  do  this,  we  include  in  the 
set  only  those  points  with  the  largest  posterior  density,  i.e.,  the  “most  likely”  values 
of  9.  (Actually,  this  minimizes  the  volume  of  the  credible  set.)  A 100(1  — a)%  HPD 


4 


(highest  posterior  density)  credible  set  for  9 is  the  subset  (7  of  0 of  the  form 

C'  = {0G0:7r(^|2/)>A:(a)}, 
where  k{a)  is  the  largest  constant  such  that 

P{C\y)>l-a. 

In  the  IQ  example,  when  the  child  scores  115  on  the  intelligence  test,  we  have  a 
A/^(110. 39,  69.23)  posterior  distribution  for  9.  Since  this  posterior  distribution  is  uni- 
modal  and  symmetric  about  110.39,  a 95%  HPD  credible  set  for  9 is 

(nO.39  - 1.96(69.23)^/^  110.39  + 1.96(69.23)^/^)  = (94.08, 126.70). 

Since  a test  score  Y is  N{9, 100),  the  classical  95%  confidence  interval  for  9 is 

(ll5  - 1.96(10)^/^  115  + 1.96(10)^/^)  = (95.4, 134.6). 

Thus  by  using  the  available  prior  information  (the  distribution  of  IQ  scores),  the 
Bayesian  method  incorporates  information  arising  from  sources  other  than  the  sta- 
tistical investigation. 

As  in  the  above  example,  we  need  to  elicit  prior  density  for  9 to  use  Bayesian 
machinery.  But,  the  most  frequent  criticism  of  Bayesian  analysis  is  that  different 
reasonable  priors  often  yield  different  answers,  casting  doubt  on  the  prospect  of  ob- 
jectivity. Thus,  in  order  to  achieve  objectivity,  it  seems  necessary  to  do  Bayesian 
analysis  with  “noninformative  priors”  or  “default  priors.”  In  the  present  work  we 
focus  on  the  use  of  noninformative  priors.  Noninformative  priors  are  discussed  in  the 


literature  review. 


5 


1.2  Literature  Review 


Bayesian  methods  have  become  increasingly  popular  in  the  theory  and  practice  of 
statistics.  This  is  partly  because  even  with  little  or  no  prior  information,  one  often 
can  use  employ  noninformative  priors  to  draw  a reliable  inference.  Thus,  various 
suggestions  have  been  advanced  for  determining  a noninformative  prior. 

The  simplest  and  earliest  noninformative  prior,  according  to  Laplace  (1812),  is  the 
uniform  prior  over  the  entire  parameter  space.  Despite  its  frequent  use,  the  uniform 
prior  typically  does  not  lead  to  another  uniform  prior  under  one-to-one  transforma- 
tions. For  instance,  a uniform  prior  for  the  population  variance  does  not  lead  to  a 
uniform  prior  for  the  population  standard  deviation.  As  a remedy,  Jeffreys  (1961) 
proposed  a prior  that  is  invariant  under  one-to-one  transformations.  This  prior  is 
proportional  to  the  positive  square  root  of  the  determinant  of  the  Fisher  information 
matrix.  Despite  its  success  in  the  one-parameter  case,  Jeffreys’s  prior  often  is  subject 
to  serious  difficulties  in  the  presence  of  nuisance  parameter(s).  For  example,  Jeffreys’ 
prior  leads  to  an  inconsistent  estimator  of  the  error  variance  in  the  balanced  one-way 
normal  ANOVA  model  when  the  number  of  cells  grows  to  infinity  in  direct  propor- 
tion to  the  sample  size  n (Berger  and  Bernardo  1992a).  That  is,  it  fails  to  avoid  the 
Neyman-Scott  phenomenon.  A second  example  is  the  product  of  two  independent 
normal  means  problem  where  a circular  symmetric  prior  is  found  to  be  superior  to 
Jeffreys’s  prior  which  in  this  case  is  a flat  prior  (Efron,  1986). 

Bernardo  (1979)  introduced  the  reference  prior  considering 


as  an  entropy  measure.  The  larger  this  quantity,  the  less  informative  the  prior. 
The  reference  prior  is  the  prior  that  maximizes  this  entropy  measure.  This  prior 
is  an  approach  for  deriving  noninformative  priors  in  multiparameter  situations  by 


6 


dividing  the  parameter  vector  into  the  parameter  of  interest  and  nuisance  parameters. 
The  idea  was  extended  further  and  generalized  in  a series  of  articles  by  Berger  and 
Bernardo  (1989,  1992a,  1992b)  who  suggested  splitting  the  parameter  vector  into 
two  or  more  groups  according  to  their  order  of  importance.  They  also  prescribed  a 
general  algorithm  for  the  construction  of  reference  priors.  Datta  and  M.  Ghosh  (1995) 
simplified  the  construction  of  m-group  reference  priors  for  block  diagonal  information 
matrices.  Suppose  that  the  parameter  vector  0 = [9i,-  ■ -Op)^  G 0 is  grouped  as 
0 = {6i,  ■■  ■,6m}  according  to  order  of  importance,  where  0i  has  rii  coordinates  and 
YhLi  = P-  We  assume  that  the  Fisher  information  matrix  I{0)  of  0 is 

I{0)  = block  diagonal  (hi{0),  ■ ■ • hm{0)) , 

where  hj{0)  is  njxnj  (may  not  be  diagonal).  Let  0^)  = |^(i),  • • • ' ' • ^(m) 

j = 1,  - • - m.  Assume  that 

for  nonnegative  functions  hji  and  hj2-  Also,  take  the  sequence  of  rectangular  compact 
sets 

» = XJl,  >l;,  where  Aj  = {flo,  : So)  € .4;} , 

A}  being  increasing  compact  sets  for  0^)  (j  = 1,  • • •,  m).  Then,  the  m-group  reference 
prior  is  given  by 

m 

<e)  = n 

j=l 

A somewhat  different  criterion  for  developing  noninformative  priors  is  based  on 
matching  the  posterior  coverage  probability  of  a Bayesian  credible  set  with  the  cor- 
responding frequentist  coverage  probability.  Such  a prior  is  often  satisfactory  from 
both  the  Bayesian  and  the  frequentist  perspectives.  This  matching  is  accomplished 


7 


through  posterior  quantiles  or  highest  posterior  density  (HPD)  region,  or  inversion 
of  certain  test  statistics.  First,  we  discuss  matching  through  posterior  quantiles. 

Let  {Xj}  , i > 1,  be  a sequence  of  independent  and  identically  distributed  possibly 
vector- valued  random  variables  with  common  density  f{x-,  9),  where  9 = {9i,-  • - , 9p)'^ 
belongs  to  some  open  subset  of  RP,  and  9i  is  the  parameter  of  interest.  We  write 
X = {Xi,  ■ ■ •,  XnY' , where  n is  the  sample  size,  and  denote  as  P'^  the  posterior 

probability  measure  for  9 under  a prior  tt{9).  Also  let  91~°‘{tt,  X)  be  the  posterior 
(1  — o:)*^  quantile,  that  is,  X)|X|  = 1 — a.  The  first  and  second 

order  probability  matching  priors  are  introduced  as  follows: 

If  a prior  n{9)  satisfies 

P^  [9i  < X)}  = 1 - a + o(n-^/2),  (1.1) 

then  it  is  called  a first  order  probability  matching  prior. 

A brief  introduction  to  matching  priors  is  given  by  Lindley  (1958),  but  its  actual 
study  begins  with  Welch  and  Peers  (1963)  who  considered  the  case  p=l,  that  is, 
when  the  parameter  of  interest  is  real-valued  and  there  is  no  nuisance  parameter. 
These  authors  showed  in  this  case  that  the  unique  solution  satisfying  the  first  order 
matching  property  is  Jeffreys’  prior.  In  the  presence  of  nuisance  parameters.  Peers 
(1965)  characterized  the  class  of  first  order  probability  matching  priors  as  solutions 
of  the  partial  differential  equation 

E (1.2) 

where  I = I{9)  — ((/jj))  is  the  Fisher  information  matrix  per  unit  observation  and 
= I~^{9)  = ((/*•’)).  Tibshirani  (1989)  characterized  the  class  of  first  order  match- 
ing priors  when  9i  is  orthogonal  (Cox  and  Reid,  1987)  to  the  nuisance  parameter 
(02,  • • ■,9p),  i.e.,  = 0,  2 < j < p.  In  this  case,  the  class  of  first  order  matching 


8 


priors  is  characterized  by 

n{e)=llC  g{92,--;9,),  (1.3) 

where  g is  an  arbitrary  function  oi  92,-  • -,9p  which  is  differentiable  in  its  arguments. 

Because  of  the  close  connection  between  the  posterior  cdf  and  posterior  quantiles, 
one  also  wish  to  study  matching  priors  in  terms  of  posterior  cdf.  Datta  and  J.K.  Ghosh 
(1995)  considered  a situation  where  interest  lies  in  a one  dimensional  parametric 
function,  say  t{9).  They  characterized  the  class  of  first  order  probability  matching 
priors  of  t{9)  as  solutions  of 

Y.-^i9j{0)TT{9)}  = 0,  (1.4) 

j=l 

where 

and  Vt{9)  = {dt{9)/d9i,- ■ ■,dt{9)/d9p)^ . 

In  the  multi-parameter  case,  interest  may  lie  in  more  than  one  component  or,  more 
generally,  in  more  than  one  parametric  function.  In  this  case,  Datta  (1996)  pro- 
vided the  necessary  and  sufficient  conditions  under  which  a prior  that  is  simultane- 
ously a probability  matching  prior  for  each  parameter  will  also  be  jointly-probability- 
matching. 

Because  g in  (1.3)  is  arbitrary,  there  are  infinitely  many  first  order  probability 
matching  priors  in  the  presence  of  nuisance  parameters.  With  the  purpose  of  narrow- 
ing down  the  selection  of  priors  within  this  class,  Mukerjee  and  Dey  (1993)  developed 
second  order  probability  matching  priors  that  satisfy 


P"  ($1  < JC))  = 1 - Q + o(n-'). 


(1.5) 


9 


They  considered  only  one  nuisance  parameter.  This  narrows  down  the  selection  of 
priors,  and  often  leads  to  a unique  prior  within  the  class  of  first  order  probability 
matching  priors.  For  an  arbitrary  number  of  nuisance  parameters,  Mukerjee  and  M. 
Ghosh  (1997)  showed  that  a second  order  matching  prior  is  obtained  by  solving  a 
second  partial  differential  equation 

P P p)2  , p p p p 

J^I^I  dOjdOr 

in  addition  to  (1.2).  In  the  above,  for  1 < j,r,s  < p,  r-^’’  = and 

Ljrs  = Eg  {d^  log  f{Xi,  9)/ 89 jdOrdOs}.  Under  parametric  orthogonality  and  (1.2), 


^ - 2t“)^(«)}  = 0 (1.6) 

v=l  j=l  r=l  4=1 


(1.6)  simplifies  to 


0,  (1.7) 


where  = Eo{dlogf{Xi,9)/d9i}^.  In  the  particular  case  of  one  nuisance  pa- 
rameter, (1.7)  reduces  to 


I W,  {/n  I22'  g(l>2)}  = 0.  (1.8) 

Next  we  discuss  the  matching  through  the  HPD  region.  We  continue  with  the 
same  setup,  and  denote  by  %{9i\X)  the  posterior  distribution  of  9i  given  X under 
a prior  7t(0).  Let  cJq  = tJa(7r,X)  be  such  that  P'^  (ix  [9i\X)  > Wq|X)  = 1 — a. 
HPD  region  for  9i  with  posterior  coverage  probability  1 — ck  is  given  by  Ha{‘7T,X)  = 
{9i  : tt{9i\X)  > oIq}.  Diciccio  and  Stern(1994)  and  J.K.  Ghosh  and  Mukerjee  (1995) 
have  characterized  priors  tt  for  which 


P [9i  G Main,  X)16>]  = 1 - a + o(n-^).  (1.9) 

In  the  presence  of  nuisance  parameters,  J.K.  Ghosh  and  Mukerjee  (1995)  characterize 
the  priors  that  satisfy  (1.9)  and  also  discuss  the  orthogonal  parameterization. 


10 


Probability  matching  priors  are  invariant  under  parametric  transformation  (Datta 
and  M.  Ghosh,  1996  ; Mukerjee  and  M.  Ghosh,  1997),  while  reference  priors  are 
invariant  only  under  parametric  transformations  within  each  parameter  group.  In 
probability  matching  priors,  “invariance”  means  that  if  TTg{0),  the  pdf  of  0 = (0i,  • • 
-,9p),  is  a probability  matching  prior  for  the  real-valued  parameter  of  interest  t{6), 
and  V’  = {ipi,  ■ • ■,'4’p)  = {ki{6),  ■ • • kp{9))  is  a one-to-one  transformation  of  6,  then 
the  transformed  prior  TT^{ip)  obtained  from  TTg{0)  by  a change  of  variables  will  also 
be  a probability  matching  prior  for  r('0)  = t{0)  under  the  tj)  parameterization. 

All  studies  of  the  probability  matching  priors  described  above  are  based  on  the 
assumption  that  observations  are  i.i.d.  Lee  (1989)  discussed  probability  matching 
priors  in  non  i.i.d.  cases.  His  derivations  are  based  heavily  on  Durbin’s  work  (1980), 
which  assumes  that  maximum  likelihood  estimators  are  sufficient  statistics.  Such 
a requirement  usually  does  not  hold  outside  the  exponential  family.  More  recently, 
Yin  (1998)  developed  probability  matching  priors  for  independent  but  not  necessarily 
i.i.d.  random  variables. 


1.3  The  Subject  of  This  Dissertation 

The  present  work  attempts  Bayesian  analysis  of  three  different  models:  an  intr- 
aclass model,  a first  order  autoregressive  model,  and  familial  data  models.  To  this 
end,  we  used  certain  noninformative  priors  including  the  widely  used  Jeffreys’  prior 
as  well  as  probability  matching  priors  and  the  different  reference  priors  of  Berger  and 
Bernardo  (1989,  1992a,  1992b).  As  mentioned  in  Ghosh,  J.K.  (1994,  p 86),  there  are 
usually  four  criteria  associated  with  the  development  of  noninformative  priors.  These 
are  (i)  maximization  of  entropy  or  minimization  of  information;  (ii)  matching  asymp- 
totically the  coverage  probabilities  of  Bayesian  credible  sets  with  the  corresponding 
frequentist  probabilities;  (iii)  the  principle  of  group  invariance,  and  (iv)  minimaxity  of 


11 


Bayesian  procedures.  Of  these,  (i)  and  (ii)  have  the  widest  applicability  in  Bayesian 
literature.  The  reference  priors  satisfy  criterion  (i).  On  the  other  hand,  probability 
matching  priors  are  based  on  criterion  (ii).  In  many  situations,  the  same  noninforma- 
tive  prior  is  optimal  according  to  one  or  more  of  these  criteria,  but  that  is  not  always 
true. 

The  organization  of  the  remaining  sections  is  as  follows.  In  Chapter  2,  we  develop 
some  noninformative  priors  for  the  intraclass  correlation  coefficient  in  the  intraclass 
model.  We  consider  probability  matching  priors  and  reference  priors  along  with 
Jeffreys’  prior.  The  one-at-a-time  reference  prior  turns  out  to  be  a second  order 
matching  prior  based  on  matching  posterior  quantile  and  also  a matching  prior  based 
on  inversion  of  the  CLR  statistic.  That  is,  the  one-at-a-time  reference  prior  emerged 
as  “optimal”  according  to  several  criteria.  We  proved  the  propriety  of  posterior 
distributions  under  Jeffreys’  and  reference  priors. 

Also,  in  Chapter  2,  we  compare  two  nested  models  such  as  the  intraclass  and 
independence  models  using  the  distance  or  divergence  between  the  two  as  the  basis 
of  comparison.  A suitable  criterion  for  this  is  the  “power  divergence  measure”  as 
introduced  by  Cressie  and  Read  (1984).  Such  a measure  includes  the  two  Kullback- 
Leibler  divergence  measures  and  the  Hellinger  divergence  measure  as  special  cases. 
The  Kullback-Leibler  and  Hellinger  divergence  measures  turn  out  to  be  a convex 
function  solely  of  p,  the  intraclass  correlation  coefficient  with  its  minimum  attained 
at  p ==  0.  Thus  the  model  comparison  problem  in  this  case  amounts  to  testing  the 
hypothesis  Hq  : p = 0 against  Hi  : 0.  Because  of  the  duality  between  hypothesis 

tests  and  set  estimation,  the  hypothesis  testing  problem  also  can  be  solved  by  solving  a 
corresponding  set  estimation  problem.  Thus,  we  construct  Bayesian  credible  intervals 
based  on  three  different  types  of  criteria:  (i)  equal  two-tailed,  (ii)  HPD,  and  (iii)  power 
divergence  measure.  An  example  is  considered  where  the  HPD  interval  based  on  the 


12 


one-at-a-time  reference  prior  turns  out  to  be  the  shortest  credible  interval  having  the 
same  coverage  probability. 

In  Chapter  3,  we  consider  autoregressive  models  that  are  used  routinely  to  an- 
alyze time  series  data.  We  develop  noninformative  priors  when  the  parameter  of 
interest  is  the  autocorrelation  coefficient  in  first  order  normal  autoregressive  models. 
Jeffreys’  prior  as  well  as  reference  priors  are  found.  We  establish  the  propriety  of  the 
posteriors  under  Jeffreys’  and  reference  priors.  These  priors  are  compared  in  light 
of  how  accurately  the  coverage  probabilities  of  Bayesian  credible  intervals  match  the 
corresponding  frequentist  coverage  probabilities.  The  reference  priors  have  a definite 
edge  over  Jeffreys’  prior  in  this  respect. 

As  in  Chapter  2,  we  compare  the  autoregressive  models  with  independence  models 
by  constructing  three  different  types  of  Bayesian  credible  intervals  and  it  is  shown 
via  a numerical  example  that  the  interval  length  of  the  credible  interval  based  on 
the  two-  or  three-  group  reference  prior  with  equal  tail  probabilities  is  smaller  than 
the  one  based  on  the  Kullback-Leibler  divergence  measure  with  the  same  posterior 
coverage  probability. 

In  Chapter  4,  we  find  the  reference  priors  along  with  Jeffreys’  prior  for  the  anal- 
ysis of  familial  data  with  applications  in  genetics.  The  five  group  reference  prior  is 
marginally  a first  order  probability  matching  prior  (to  be  defined  in  Chapter  4),  but 
the  two  group  reference  prior  and  Jeffreys’  prior  are  not.  We  proved  the  propriety  of 
the  posteriors  both  under  the  Jeffreys’  prior  and  under  the  reference  priors.  Also,  in 
this  chapter,  we  undertake  a simulation  study  to  compare  the  proposed  noinformative 
priors  in  terms  of  frequentist  coverage  probability.  We  have  outlined  also  how  the 
Bayesian  method  can  be  implemented  via  Gibbs  sampling. 

Chapter  5 contains  a summary  and  some  ideas  for  future  research. 


CHAPTER  2 
INTRACLASS  MODELS 

2.1  Introduction 


The  objective  of  this  chapter  is  twofold.  The  first  is  to  find  a Bayesian  credible 
interval  for  the  intraclass  correlation  coefficient  in  symmetric  normal  models  based 
on  some  “default”  or  “noninformative”  prior.  The  second  objective  is  to  compare  two 
nested  models  such  as  the  intraclass  and  independence  models  using  the  distance  or 
divergence  between  the  two  as  the  basis  of  comparison.  A suitable  criterion  for  this 
is  the  “power  divergence  measure”  as  introduced  by  Cressie  and  Read  (1984).  Such 
a measure  includes  the  two  Kullback-Leibler  divergence  measures  and  the  Hellinger 
divergence  measure  as  special  cases.  The  Kullback-Leibler  and  Hellinger  divergence 
measures  turn  out  to  be  a convex  function  solely  of  p,  the  intraclass  correlation 
coefficient  with  its  minimum  attained  at  p = 0.  Thus  the  model  comparison  problem 
in  this  case  amounts  to  testing  the  hypothesis  Hq:  p = Q against  H\  \ Because 

of  the  duality  between  hypothesis  tests  and  set  estimation,  the  hypothesis  testing 
problem  also  can  be  solved  by  solving  a corresponding  set  estimation  problem.  The 
present  chapter  develops  Bayesian  methods  based  on  a power  divergence  measure, 
rejecting  iLo  : p = 0 if  the  Bayesian  credible  interval  p does  not  contain  0 so  that 
the  resulting  credible  interval  for  the  divergence  measure  has  a specified  coverage 
probability  of  1 — a.  The  length  of  such  an  interval  is  compared  with  (i)  the  equal 
two-tailed  credible  interval  and  (ii)  the  HPD  credible  interval  for  p with  the  same 
coverage  probability  which  can  also  be  inverted  into  acceptance  regions  of  ifo  : p = 0. 


13 


14 


The  outline  of  the  remaining  sections  is  as  follows.  In  Section  2,  we  develop 
noninformative  priors  based  on  matching  asymptotically  the  coverage  probabilities  of 
Bayesian  credible  intervals  with  the  corresponding  frequentist  coverage  probabilities. 
This  matching  is  accomplished  either  through  (i)  the  posterior  quantiles,  (ii)  the 
highest  posterior  density  (HPD)  regions,  or  (iii)  the  inversion  of  certain  test  statistics. 
Among  these,  matching  based  on  posterior  quantiles  is  used  most  widely.  Our  work 
also  focuses  on  this,  although  we  judge  the  performance  of  such  priors  in  the  light 
of  (ii)  and  (iii).  Also,  in  this  section,  we  develop  certain  reference  priors  following 
the  algorithm  of  Bernardo  (1979)  or  Berger  and  Bernardo  (1989,  1992a,  1992b).  In 
the  present  example,  the  one-at-a-time  reference  prior  turns  out  to  be  a second  order 
matching  prior,  and  it  is  different  from  Jeffreys’  prior.  The  latter  is  not  a second 
order  matching  prior  in  this  example. 

In  Section  3,  we  establish  propriety  of  posteriors  under  a general  class  of  priors  that 
includes  the  one-at-a-time  reference  prior  and  Jeffreys’  prior  under  certain  conditions. 
In  this  section,  we  also  perform  a small  simulation  study  comparing  the  one-at- 
a-time  reference  prior  with  Jeffreys’  prior.  The  former  is  found  to  meet  the  target 
frequentist  coverage  probabilities  more  accurately  than  the  latter,  especially  for  small 
and  moderate  sample  sizes. 

Section  4 addresses  the  model  choice  or  hypothesis  testing  problem.  We  compare 
the  intraclass  model  and  the  independence  model  by  constructing  a Bayesian  credible 
interval  on  the  basis  of  a power  divergence  measure  and  the  length  of  such  an  interval 
is  compared  with  (i)  the  equal  two-tailed  credible  interval  and  (ii)  the  HPD  credible 
interval  for  p with  the  same  coverage  probability.  A numerical  example  is  considered 
demonstrating  a slight  edge  of  the  HPD  interval.  Finally,  some  concluding  remarks 
are  made  in  Section  5. 


15 


2.2  Noninformative  Priors 
2.2.1  Fisher  Information  Matrix 


We  consider  the  intraclass  model  y = x/3  + e,  where  e is  fc  x 1 random  vector 
distributed  as  Nk{0,a^V),  V = {1  — p)Ik  + pJk,  Ik  being  the  identity  matrix  of 
order  k,  Jk  being  a,  k x k matrix  with  each  element  equal  to  1.  Here  /3  is  the  p x 1 
vector  of  unknown  regression  parameters,  and  X is  the  known  k xp  design  matrix  of 
rank  p (<  k).  We  note  that  if  p=0,  then  this  model  reduces  to  the  linear  regression 
model  with  independent  errors.  For  the  given  intraclass  model,  the  pdf  of  Y is  given 
by 


= (27t)  ^\a^V\  2 exp 


{Y  - Xl3fV-\Y  - X^) 


2(j2 


exp 


2(t2(1  - p) 


\Y  -XI3\^- 


P \-tTr”- 


1 + {k  - l)p 


Y{Y-X/3)\^ 


X (27t)  "((7^)  "(1-p)  ''(l  + (^-l)p)  ^ 


(2.1) 


where  1 is  a fc-component  vector  with  each  element  equal  to  1. 


The  parameter  of  interest  is  p,  the  intraclass  correlation  coefficient.  To  obtain  the 
noninformative  priors,  first  we  find  the  per  observation  Fisher  information  matrix 
I{p,a^,l3)  as 


I = I{p,a^,0) 


2(l-p)'"[l+(fe-l)p] 

—k{k—\)p 

2o-2(l-p)[l+(fe-l)p] 

0 


—k(k—l)p 

2o-2(1-p)[i+(A:-1)p] 

k 

2(7* 

0 


0^ 


(2.2) 


where  t(p,  (T^)  = 


v-^x  = 


'(1-^)  I 


X^x 


l+(fc-l)p 


X^JkX 


16 


Next,  we  find  an  orthogonal  reparameterization  (Cox  and  Reid,  1987)  of 
To  this  end,  let  6i  = p,  62=  g{p,  cr^),  and  63  = /3,  and  we  obtain  a solution  for 


I = I{p,a\l3)  = 


1 

ds_ 

dp 

0^ 

0 

dar'^ 

0^ 

0 

0 

0 

9 ^62 ,^2 


0 


0 I 


0^ 
0^ 

0s>^3 


1 0 0^ 

§R  qT 

dp  dp  ^ 

0 0 


• (2.3) 


This  leads  to  the  equations 


W hxfil  + I ^ I ^2,92 


{b) 


'dg\  ( dg 


dp)  \da^ 


7^2 ,^2 


k(k  - 1)  [1  + (A:  - l)p^] 
2(l-p)'[l  + (A:-l)pf’ 

—k(k  — l)p 

20-2(1 -p)  [l  + (fc-l)p]’ 


and  (c) 


day  2(j4' 


From  (a)-(c),  one  gets 


^ dg  {k  - l)pg^ 
dp  da^  (1  - p)  [1  + (A;  - 1)  p] 


fc-1  1 

and  a solution  is  given  by  ^ = ^(1  — p)  * (1  + (A:  — l)p)  . Thus, 


6>i=p,  02  = -{l-p)  (l  + (A:-l)p)  S 03  = ^- 


With  this  orthogonal  transformation,  the  per  unit  Fisher  information  matrix  reduces 
to 


I{0)  = I {9i,  02,  03)  = Diag 


k{k  - 1) 


,2(l-0i)"(l  + (A;-l)0i) 


2 ’ 


(2.4) 


where  h{6i,92)  = ^2(1  - ^1)  *'[1  + (A;  - 


17 


2.2.2  Quantile  Matching  Priors 

We  begin  by  discussing  quantile  matching  priors.  For  the  specific  intraclass  model 
p = is  the  parameter  of  interest,  and  0i  is  orthogonal  to  (02)^3)-  Hence,  by 
Tibshirani  (1989),  the  class  of  first  order  probability  matching  priors  is  characterized 
by 

7Ti(0i,  02,  0a)  oc  (1  - 0i)-'(l  + {k-  l)0i)-'p(02,  0a),  (2.5) 

where  p(-)  is  an  arbitrary  positive  function  of  62  and  03,  differentiable  in  its  arguments. 

Clearly,  there  are  infinitely  many  first  order  matching  priors.  It  is  possible  to  nar- 
row down  the  selection  of  priors  in  this  class  by  requiring  the  second  order  matching 
property  (Mukerjee  and  Dey,  1993  ; Mukerjee  and  M.  Ghosh,  1997).  Since  0s  is  the 
regression  coefficient,  we  find  it  convenient  to  consider  the  subclass  g{02, 0a)  oc.  h{02). 
Since  parametric  orthogonality  holds  and  ® ~ 1>2),  following 

Mukerjee  and  M.  Ghosh  (1997),  a second  order  matching  prior  within  the  above 
subclass  is  found  as  a solution  of 

I {/n = 0,  (2.6) 

where  = !-'{$)  = {{Pi)),  L,,„  = Eg  (a  log/(y;  e)/m,Y  , L„,  = E 
The  following  theorem  provides  the  unique  solution  h{02)  to  (2.6). 

Theorem  1 The  unique  solution  to  (2.6)  is  given  by  h{02)  oc  0^^.  Thus,  the  unique 
second  order  matching  prior  within  the  above  subclass  is  given  by 

^2{eu  «2,  9s)  a (1  - + (*  - 1)  9i)-'  92"'. 


(2.7) 


18 


Remark  1 Because  of  invariance  of  probability  matching  priors  (Datta  and  M.  Ghosh, 
1996 ; Mukerjee  and  M.  Ghosh  1997),  the  second  order  matching  prior  in  the  (p,  /3) 

parameterization  reduces  to 


1 


7r{p,a^,l3)  (X  ^{1-  p)  + p)  \ 


(2.8) 


Proof  of  Theorem  1.  With  the  parameterization  0 = (0i,02>^3)  from  (p,  <t^,/3),  the 
likelihood  in  (2.1)  reduces  to 


k h 


f{Y-,9i,92,03)  = {2n)-^ei 


X exp 


02 


i + {k-my 

1-01  I 


(V  - XOsf  Ik  - 


01 


1 + (A:  — 1)^1 


Jj  (r  - X0s) 


Then,  after  much  algebraic  simplification,  it  follows  that 


dlogf  ^ _1 . [l  + (fc-l)^i]*- 
d0,  2^  (l-0i)^+^ 


- {(y  - XGsfAkiY  - X03)}  , (2.9) 


where  A*,  = [1  + (fc  - l)^i]  J*.  - Jk- 


Further, 


y - X6>3  ~ N{0,  {(1  - 0,)lk  + 0iJk}); 


E 


'{Y  - XdsfAkiY  - Xds)]  = 0. 


Thus,  from  (59),  p 67  of  Searle  (1971),  it  follows  that 

^1,1  = 0l  ~ 


19 


which  on  simplification,  reduces  to 

Lx, 1,1  = k{k  -l){k-  2)(1  - + (^  - 1)  (2.10) 

_ 3 

Hence,  /i7  Li,i,i  does  not  depend  on  6i,  and  the  first  term  in  the  left-hand  side  of  (2.6) 
is  0.  Also,  Lii2  = —02^ In  so  that  on  simplification  (2.6)  reduces  to  ^ [02h{92)]  = 0, 
and  hence  h(02)  oc  This  proves  the  theorem. 


2.2.3  Matching  Based  on  Highest  Posterior  Density  (HPD)  Regions 

We  would  like  to  discuss  matching  through  the  highest  posterior  density  (HPD) 
region.  The  question  is  whether  the  HPD  matching  prior  exists  within  the  subclass, 
T^ii^i,  O2,  O3)  oc  (1  - 0i)~^(l  -I-  (A:  - l)9iy^h{92).  Since  9i  is  orthogonal  to  {92, 03), 
in  the  given  intraclass  model,  from  (4.1),  p 137  of  J.K.  Ghosh  and  Mukerjee  (1995), 
it  follows  that  a prior  tt  satisfies  HPD  condition  if  and  only  if 

- y^{Lu2li'ln'n)  - = 0.  (2.11) 

In  this  case, 

= -i 

-^{Lu2l22  ^n'^)  — ~ ^i)  ^(1  + (A:  - l)^i)  ^ 

A(L.„/r,M  = -f  HO,). 


o 


d9i 


d 


— ^ fi(^2)  + ^(1  - ^i)  ^(1  + (A:  - l)^i)  ^ 


[92  he^] 


Hence, 


20 


The  left-hand  side  of  (2.11)  is  equal  to  0 if  and  only  if 

h(e,)  = -i(l  - + {k-  h(0,)] . 

It  implies  that  an  HPD  matching  prior  does  not  exist  within  the  subclass. 

Remark  2 The  quantile  second  order  matching  prior  derived  is  not  an  HPD  match- 
ing prior. 


2.2.4  Matching  Based  on  Inversion  of  Likelihood  Ratio  Statistics 

J.K.  Ghosh  and  Mukerjee  (1991)  and  Severini  (1991)  considered  priors  ensuring 
frequentist  validity  up  to  o{n~^)  of  the  credible  regions  based  on  the  inversion  of 
a posterior  Bartlett  corrected  conditional  likelihood  ratio  (CLR)  test  statistic  when 
there  are  no  nuisance  parameters.  DiCiccio  and  Stern  (1994)  considered  the  more 
general  case  which  admits  nuisance  parameters.  When  there  is  only  one  real  valued 
parameter  of  interest  that  is  orthogonal  to  the  nuisance  parameter,  Yin  and  M.  Ghosh 
(1997)  showed  that  a second  order  matching  prior  based  on  the  posterior  quantiles  is 
also  a prior  for  which  the  Bayesian  and  frequentist  Bartlett  corrections  for  the  CLR 

_3 

statistic  differ  by  o(l)  if  and  only  if  A7  ^1,1,1  is  independent  of  6.  This  being  the 
case  here,  the  proposed  second  order  matching  prior  is  also  a matching  prior  based 
on  inversion  of  the  CLR  statistic. 


2.2.5  Reference  Priors 


Bernardo  (1979),  and  later  Berger  and  Bernardo  (1989,  1992a,  1992b)  developed 
noninformative  priors  which  have  become  known  as  “reference  priors”.  These  priors 


21 


are  obtained  by  maximizing  a suitable  entropy  distance.  In  view  of  the  form  of  the  in- 
formation matrix  as  given  in  (2.4),  when  6i  is  the  parameter  of  interest,  by  taking  rect- 
angular compacts  for  9i,  62  and  63,  it  follows  from  Berger  and  Bernardo  (1992a)  that 
two  group  reference  prior  for  {^i,  {62, 0s)}  is  TTr{0)  oc  92^(1  — ^i)~^(l  + (A:  — l)^i)~^ 
while  the  one-at-a  time  reference  prior  with  ordering  {9i,92,03}  or  {9i,03,92}  is 
7Tr{0)  (X  92~^{1  — 0i)~^(l  -I-  (A:  — l)0i)~^.  Thus  the  one-at-a  time  reference  prior  is 
also  a second  order  matching  prior. 


2.3  Propriety  of  the  Posterior  Distributions 

First  we  find  the  marginal  posterior  distribution  of  p.  Suppose  Yi,Y2,-  • -,Yn 
are  iid  random  vectors  from  the  above  intraclass  model.  The  likelihood  function  is 
given  by 


L{p,a^,^)  oc  (<T^)  Ml-P)  ^^(1  + (A:- l)p)  ^ 


X exp 


~\t{Y,-Xl3fV-^(Y,-X0) 


(2.12) 


One  may  recall  that  V = (1  — p)Ik  + pJk-  Writing  Y = (Yi,-  • •,  Yn)>  under  the 
second  order  matching  prior  given  in  (2.8)  the  joint  posterior  density  is  given  by 


7r(p,(r2,/3  I Y)  a (a^)  ^^"''^^^(1  - p)  ^ i(i  + (/j  _ i)p)  2 1 


X exp 


Y,{Yi-X0)W-\Yi-X0) 


2o-2 


a=l 


. (2.13) 


0=[x'^v~'x)  'x^V~'Y  {Y  = n^'J2Yi), 


i=l 


Writing 


22 


S(p)  = Y,(Yi-X0)  V-\Yi-X'^). 

1=1 


Integration  with  respect  to  /3  and  cr^  yields  the  marginal  posterior  pdf  of  p as 


7t(p  I y)  oc  (1  - + {k-l)  \X'^V-^X\ 


The  following  theorem  proves  the  propriety  of  7r(p  | y). 


(2.14) 


Theorem  2 7t(p  I y)  is  proper. 

Proof  of  Theorem  2.  We  consider  partitioning  the  interval  (— (A;  — 1)~^,  1)  for  p into 
two  intervals  i)  — < P < 0 and  ii)  0 < p < 1. 

Since  V = (1  — p)/*.  + pj*.  is  a non-negative  definite  matrix  with  eigenvalues  1 — p 
and  1 -I-  (A:  - l)p,  < max  |(1  - p)“\  (1  -I-  (A;  - l)p)“^|  I*,. 

Thus,  if  — < P < 0, 

< max  {(1  - p)“\  {l  + {k-  l)p)"^}  Jfe  = (1  -h  (A:  - l)p)“^/fc  ; 

< 1(1  + (A:  - l)p)“'X^Xr^  = (1  + (A:  - l)p)^\X^X\~^  ; 

(S(p))-^(""-")  < (1  + (A:  - l)p) (f:  (y^  - X^fiYi  - X0) 

U=i 

Hence,  for  — ^ < P < 0,  the  right  hand  side  of  (2.14)  is  bounded  above  by 

Cl  (1  - p)-t('=-i)-i  (1  + (A:  - < ci[l  + (A:  - l)p]^^''"^^"\ 

where  Ci  is  a positive  constant  not  depending  on  p.  Hence,  the  posterior  marginal 
density  of  p is  integrable  on  oj . 


|-i(nfc-p) 


23 


On  the  other  hand,  if  0 < p<  1, 

< max  |(1  - p)~\  {l  + {k-  l)p)"^}  /fc  = (1  - p)~^h  ; 

< 1(1  _ p)-^X^X\~^  = (1  - p)^\X^X\~^  ; 

, , r n „ 1 -i(nfc-p) 

(5(p))-^(""-^)  < (1  - = E (Yi  - X0)  {Yi  - X0) 


Thus,  for  0 < p < 1,  the  right-hand  side  of  (2.14)  is  bounded  above  by 

C2  (1  - p)^"^  [1  + (^  - < C2(l  - p) 

where  C2  is  a positive  constant  not  depending  on  p,  so  that  the  posterior  marginal 
pdf  of  p is  integrable  on  (0, 1).  The  result  follows. 

Next  we  consider  the  marginal  pdf  of  p under  Jeffreys’  prior.  Jeffreys’  prior 
is  defined  as  the  positive  square  root  of  the  determinant  of  the  Fisher  information 
matrix.  Prom  (2.2),  we  obtain  the  Jeffreys’  prior  as 

Tj{p,  oc  (rr=')-'^(l  - p)-'(l  + (fc  - 1)  p)-'  \X^V-'X\K  (2.15) 


Thus,  under  the  Jeffreys’  prior,  the  joint  posterior  pdf  is  given  by 


nk-\-p-\-2 

7^^(p,a^/3|y)  oc  (a^)”  ^ (1  - p)-“-^(l  + (A:  - l)p)-t-^|X'^V-^X 


1 


I VT  T7--1 


X exp 


J2{Yi-X/3fV-\Yi-X^) 


2ct2 


.1=1 


(2.16) 


Integration  with  respect  to  ^ and  cr^  yields  the  marginal  posterior  pdf  of  p as 
■nj(p\Y)  oc  (1  - + (k-  S(pY~. 


(2.17) 


24 


Now,  we  prove  the  propriety  of  7rj(p  | Y).  Note  that 


7Tj(p|l^)  < C3(l  - p)  i(i  + (/;_i)p)  2 ^{maa:[(l-p)  \ (1  + (A:  - l)p)  ^]  } 


where  C3  (>  0)  depends  on  Y and  X but  not  on  p.  Now,  proceeding  similarly  as 
Theorem  2,  the  propriety  of  TTj{p,a'^,l3  \ Y)  follows. 


2.4  Computer  Simulation 


2.4.1  Method 


Following  Sun  and  Ye  (1996),  we  compare  one-at-a-time  reference  prior  (ttjj)  with 
Jeffreys’  prior  (ttj)  for  the  intraclass  model  by  calculating  the  frequentist  coverage 
probability  of  the  posterior  tail  probability  of  p. 

We  write 


F^^{z)  = P^^{p  < z\y), 

F^^{z)  = P^j{p  < z\y), 

and 

F^'^ipaiiTR))  = F^-’ipainj))  = a. 

That  is,  Pa(FR)  and  Pa(7Tj)  denote  the  posterior  a-quantiles  of  p given  Y = (Yi,  • • 
•,  Yn),  under  the  priors  ttr  and  ttj,  respectively. 

The  corresponding  frequentist  coverage  probabilities  are 


P{a]TTR,p,a^,^)  = P(p  < Pa(7Tij)|p,(7^/3)  = P{F^^{p)  < F’^«(pa(7^;^))|p,^7^^); 
P{a-,7rj,p,a‘^,0)  = P{p  < pa{'n:j)\p,a‘^,  I3)  = P(P’^-^(p)  < P’"^(pa(7rj))|p,  /3). 


25 


If  the  coverage  probabilities  P{a]TTn,  p,cr^,  f3)  and  P(o;;7rj,p,  (T^,/3)  are  close  to  a 
even  if  sample  sizes  are  small,  then  we  have  evidence  that  the  chosen  priors  perform 
well  respect  to  the  probability  matching  criterion. 

P{a;iTji,  p,a^,  0)  and  P(o;;  ttj,  p,  cr^, /3)  are  estimated  in  the  following  way.  For 
each  value  of  n,  10,000  samples  CKi,  • • - , Y^)  are  generated  using  OX  software.  Next, 
Pai’^n)  and  Pa{T^j)  are  computed  for  each  of  10,000  sets  of  random  samples.  Then, 
P(a;  TTij,  p,  cr^,  /3)  and  P(a;  ttj,  p,  (7^,  are  estimated  by  the  proportion  of  p < Pa{T^R) 
and  p < Pa(7Tj),  respectively.  Or,  F'^^{p)  (or  F^-’(p))  is  computed  for  10,000  sets 
of  ("Ki,-  • -jYn).  Then  P{a-,TTR,  p,a^,  0)  (or  P{a-,7Tj,  p,a^,  I3))  is  estimated  by  the 
proportion  of  F^^{p)  (or  F'^^{p))  observed  to  be  < a.  Essentially,  we  are  counting 
how  often  we  observe  p < Pa{T^R)  (or  P < Pa(7Tj))- 


2.4.2  Results 


Table  2.1  provides  the  estimated  tail  probabilities  of  the  posterior  distributions  of  p 
under  one-at-a-time  reference  prior  {ttr)  and  Jeffreys’  prior  (ttj)  when  the  frequentist 
tail  probabilities  are  .95  and  .05  in  the  trivariate  case.  The  computation  of  these 
numerical  values  was  based  on  the  10,000  samples  for  given  true  values  of  p when  the 
first,  second  and  the  third  rows  of  X are  (1,0,0),  (1,1,0)  and  (1,1,1),  /3  = (1, 1, 1)^  and 
0 = 1.  For  the  cases  presented  in  Table  2.1,  we  see  that  one-at-a-time  reference  prior 
{'Kr)  is  found  to  meet  the  target  frequentist  coverage  probabilities  more  accurately 
than  Jeffreys’  prior,  especially  for  small  and  moderate  sample  sizes.  We  have  also 
done  simulations  with  other  values  of  p such  as  p=0  and  0.8,  and  the  conclusions  are 
very  similar. 


26 


Table  2.1:  Estimated  Tail  Probabilities  of  Posterior  Distributions  under  the  One-at- 
a-time  Reference  Prior  {ttr)  and  Jeffreys’  Prior  (ttj)  and  Different  Sample  Sizes 


p = -0.4 

II 

O 

p = -0.4 

o 

II 

P(0.05;p,  (T^^) 

P(0.05;p,  (t^/3) 

F(0.95;p,  (7^/3) 

P(0.95;p,  fr2,/3) 

n 

t^r 

tth 

7Tj 

tth 

TTj 

TTfi 

TTj 

2 

0.0509 

0.1254 

0.0493 

0.1175 

0.9252 

0.9009 

0.9466 

0.8372 

5 

0.0499 

0.0715 

0.0498 

0.0692 

0.9507 

0.9272 

0.9536 

0.9289 

10 

0.0525 

0.0596 

0.0496 

0.0582 

0.9514 

0.9406 

0.9444 

0.9358 

20 

0.0483 

0.0526 

0.0489 

0.0531 

0.9503 

0.9445 

0.9483 

0.9432 

30 

0.0524 

0.0556 

0.0491 

0.0522 

0.9476 

0.9446 

0.9485 

0.9456 

2.5  Divergence  Measures 


As  noted  in  Section  2,  testing  the  hypothesis  Hq  : p = 0 amount  to  the  model 
selection  between  the  intraclass  model  and  the  independence  model.  Due  to  the 
duality  between  hypothesis  testing  and  set  estimation,  this  hypothesis  testing  problem 
can  be  solved  by  solving  a corresponding  set  estimation.  In  a Bayesian  sense,  we  can 
solve  this  hypothesis  test  by  constructing  the  Bayesian  credible  interval  of  p.  It  has 
long  been  known  that  the  distance  or  divergence  between  two  distributions  is  often  a 
suitable  basis  for  model  comparison.  In  this  section,  we  compare  the  intraclass  model 
and  the  independence  model  by  constructing  a credible  interval  for  p on  the  basis  of 
the  divergence  measures.  A suitable  criterion  for  this  is  the  power  divergence  measure 
introduced  by  Cressie  and  Read  (1984).  For  two  pdf’s  /i  and  /2,  this  divergence 
measure  is  defined  by 


It  can  be  shown  that 


A(A  + 1) 


Ef 


A — > 0,  D\  Ef^  log^  — K{f I : /2); 

A — 1,  D\  — >■  Ef^  = E{f 2'.  /i); 

D_,_=2  j - y'/2W  dx  = 2H{fi  : f2). 


(2.18) 


27 


Note  that  K{fi  : /2)  is  Kullback-Leibler  divergence  measure  for  fi  with  respect  to  /2 
and  H{fi  : f2)  is  Hellinger  divergence  measure.  The  following  theorem  provides  an 
expression  for  Dx  when  /i  is  N{X0,  Wi)  and  /2  is  N{Xj3,  W 2)  and  there  are  n iid 
observations  from  these  models. 

Theorem  3 The  power  divergence  measure  between  fi  = N{X^,Wi)  and  f2  — 
N{XI3,W2)  is  given  by 

+ x{h  - Q)r^  - 1}  , -1  < A < (fc  - 1)-'  (2.19) 

where  Q = 


Remark  3 When  W\  = {1  — p)Ik  + pJk  and  W2  = (1  — Po)Ik  + PoJk,  the  power 
divergence  measure  between  fi  — N{X^,  Wi)  and  f2  = N{X^,  W2)  is  given  by 


Dx  = 


A(A  + 1) 


Ei 


(fijYi) 

\f2{Y,) 


(2.20) 


where 


Eh 


(MY,) 

\MYi) 


nX 


X 


(1  — Po)^^^  ^^(1  + (fc  — l)po)  ^ [1  + Ap  — (A  + l)po] 


[1  — (A:  — 1)  {Ap  — (A  + l)po}]  ^ 


nA(fc-l) 

1 - Po\  ' A + (fc  - l)po 
1-pj  Vl  + (A:-l)p 


Remark  4 The  power  divergence  measure  between  f,  —intraclass  model  and  /2  = 
independence  model  is  given  by 


Dx 


1 

A(A  + 1) 


X(fe-l)  X ^ 

(1-p)  ^ (l  + (fc-l)p) 

(l  + Ap)^^''“^Hl-  (^-  l)Ap)^  J 


(2.21) 


From  the  remark  4,  we  get  (when  po  = 0) 

Tl  T) 

- 2 “ 1)  ~ 2 ’ 


K{h-.h)  = 


28 


K{f2  : /i) 
H{h  : /2) 


77  77 

-(A:  - 1)  log(l  - P)  + 2 + 


nk{k  — l)p^ 
2(l-p)(l  + (A:-l)p)’ 


_ nfc 
2 2 


(1-p)^^^  ^^(1  + (fc  - l)p)M 
(2-p)^('=-D(2  + (A:-l)p)q 


Proof  of  Theorem  3. 


Dx 


1 

A(A  + 1) 
1 

A(A  + 1) 


Ef. 


f2{Y,,--;Yn) 


(MY,) 

\MYi) 


- 1 


But, 


F /'WIi)  V _ IEj 

''\f2(Yi))  ]W, 


I A/2 


exp  I ^(y  - X/3)"’  (IV,-'  - Wf‘)  (l'  - 3)  I 


Since  Y - X/3  ^ N{0,  W,)  when  f,  holds, 

E exp  I j(y  - X0f  (W,-‘  - wr')  (y  - X/3)| 
where  Z ~ N{0,lk)  and  Q = 


E 


exp|-^Z^  {h-Q)Z 


On  simplification,  one  gets 


\h+x{h-Q)n. 


Proof  of  Remark  3.  In  this  case,  Wi  = {1  — p)Ik  + pJk  and  W2  = — Po)Ik  + PoJk- 


Thus, 


n\  n\(k—l) 

\j^Y  = " A + (fc-i)po\ 

\W,\)  [l-pj  [l  + {k-l)pj 


and 


W2\  \ik  + x{h-w{h-w\^^w^^wY^)\ 


nA 

2 


29 


|(A  + l)Jfcl  \W2- 


X 

A + 1 


Wi\ 


— (A  + 1)^  1 1(1  — po  — ^ , (1  — p)\  Ik+  (po  — tttP  ) 


A + 1 


- (aTT  + A^'’ “ 

lfe-1 


ifc-1  r 


A + 1 

1 ,,  (fc-l)A 

+ {k  — l)po TT“i — P 


A + 1 


A + 1 


— [1  + Ap  — (A  + l)po]  [1  ~ ~ 1)  {^P  ~ + l)Po}]  • 


After  some  simplifications,  we  get 

n\ 


E 


h 


= (l-Po)^^*"  '^(l  + (fc-l)po)  = [l  + Ap-(A  + l)po]  ^ 


(fc-i) 


\f2{Y{)) 


X [1  - (fc  - l)(Ap- (A  + l)po)]  2 


a A “ Po 


r^A(fc-l) 


I-  P 


( 1 + {k  — l)po 

V 1 + (^  - i)p. 


nX 


We  now  prove  a theorem  which  establishes  Dx  as  a function  of  p has  a minimum  at 
p = 0if-l<  A<  (A;-l)“\ 

Theorem  4 When  po  — 0,  Dx  has  a minimum  at  p = 0 if  —1  < X < (k  — 1)  ^ . 


Proof  of  Theorem  4.  We  consider 


i{p) 


1 


A(A  + 1) 


(1  - pY 


(1  + Ap) 


A(fc-l)  A 

^(l  + (fc-  l)p)~^ 
_ (fc-  i)Ap)2 


n 


It  suffices  to  show  that  t{p)  has  a minimum  at  p = 0.  Note 


^log  t{p) 


nX{k  — 1)  nX{k  — 1)  nX{k  — 1)  nX{k  — 1) 

2(1  - p)  “ 2 [1  + (fc  - l)p]  “ 2 [1  + Ap]  ^ 2 [1  - (A:  - l)Ap] 


^nk{k  — 1)A(A  + l)p 

A 


1 — {k  — l)Ap^ 

(1  — p)  (1  + (A:  — l)p)  (1  + Ap)(l  — {k  — l)Ap) 


Since  ^ t{p)  = f^logt{p)  t{p), 


A. 

dp 


t(p) 


X 


^nk{k  - l)p  [l  - (A:  - l)Ap^]  (1  - p) 
[1  + (A:  - l)p]“^"^  (1  + 


[1  - (A  - l)Ap]-?-‘. 


30 


Table  2.2;  Credible  Intervals  for  p Based  on  Different  Procedures 


procedure 

90% 

95% 

interval 

length 

interval 

length 

equal  tail 

(-0.2743,  0.7150) 

0.9893 

(-0.2888,  0.8459) 

1.1347 

HPD 

(-0.3165,  0.5226) 

0 8391 

(-0.3188,  0.7170) 

1.0358 

K{h  ■ f2) 

(-0.3333,  0.7575) 

1.0908 

(-0.3333,  0.8705) 

1.2038 

K{f2  : /i) 

(-0.3333,  0.7575) 

1.0908 

(-0.3333,  0.8705) 

1.2038 

H{h  : /2) 

(-0.3333,  0.7575) 

1.0908 

(-0.3333,  0.8705) 

1.2038 

Note  that  ^ t{p)  |p=o  = 0.  When  -1  < A < (A:  - 1)  \ t{p)  is  strictly  increasing 
function  if  p > 0 and  t{p)  is  strictly  decreasing  function  if  p < 0 , implying  D\  has  a 
minimum  at  p = 0 when  if  —1  < A < (A;  — 1)~^. 


Next  we  consider  a numerical  example  to  compare  the  length  based  on  (i)  inverting 
the  equal  two-tailed  and  (ii)  HPD  credible  interval  into  the  acceptance  regions  and 
(iii)  the  ones  based  on  the  two  Kullback-Leibler  and  Bellinger  divergence  measures 
in  terms  of  lengths  of  intervals.  Here  n = 3 and  A;  = 4.  The  data  (Searle  et  al.  1992, 
p 47)  in  the  first,  second  and  third  classes  are  (3,  3,  12,  2),  (11,  13,  17,  7),  and  (4,  2, 
1,  33)  respectively.  Table  2.2  provides  90%  and  95%  credible  intervals  for  p using  the 
procedures  mentioned  above.  We  note  that  the  HPD  interval  has  the  shortest  length, 
the  equal  tail  interval  is  longer,  while  the  intervals  based  on  the  divergence  measure 
are  the  longest.  However,  the  latter  intervals  all  have  the  same  length. 


31 


2.6  Concluding  Remarks 

The  present  intraclass  model  neither  strictly  contains  nor  is  strictly  contained  in 
the  balanced  one-way  random  effects  model.  For  the  latter,  the  intraclass  correlation 
coefficient  p = where  and  al  are  respectively  the  treatment  and  error 

variances.  In  this  case  p G (0, 1)  unlike  our  case  where  it  can  assume  negative  values. 
First  order  quantile  matching  priors  for  the  balanced  one-way  random  effects  model 
with  various  parameters  of  interest  are  considered  in  Datta  and  M.  Ghosh  (1995). 
For  the  present  model,  the  one-at-a-time  reference  prior  is  a second  order  probability 
matching  prior  and  is  also  a matching  prior  via  inversion  of  conditional  likelihood 
ratio  test  statistic. 


CHAPTER  3 

FIRST  ORDER  AUTOREGRESSIVE  MODELS 
3.1  Introduction 


Autoregressive  models  are  routinely  used  for  the  analysis  of  time  series  data. 
The  classical  inferential  procedures  for  such  models  are  well-documented  in  several 
textbooks  and  multiple  journal  articles.  Among  others  we  refer  to  the  classical  texts 
due  to  Box  and  Jenkins  (1970),  Fuller  (1976),  Brillinger  (1981),  and  the  references 
cited  therein. 

Systematic  Bayesian  analysis  of  autoregressive  models  began  with  the  pioneering 
work  of  Zellner  and  Tiao  (1964)  who  considered  first  order  autoregressive  processes. 
Bayesian  analysis  of  both  first  and  second  order  autogressive  processes  are  given  in 
Zellner  (1971).  Among  some  of  the  later  developments  in  this  area,  an  important  con- 
tribution is  due  to  Chow  (1975)  who  found  the  moments  of  the  predictive  distribution 
of  future  vector  observations.  A subsequent  Bayesian  vector  autogressive  model  is 
by  Litterman  (1980)  who  considered  prediction  of  economic  variables.  Lahiff  (1980) 
developed  a numerical  algorithm  which  produced  posterior  and  predictive  inferences 
for  first  order  autogressive  processes.  Diaz  and  Farah  (1982)  reported  a Bayesian 
technique  for  calculating  posterior  distributions  of  autogressive  processes  of  an  arbi- 
trary order.  Some  of  these  findings  are  admirably  documented  by  Broemeling  (1985) 
in  a very  general  framework. 

This  chapter  focuses  on  the  first  order  non-explosive  stationary  autogressive  (AR) 
processes  and  develops  default  priors  which  are  optimal  according  to  both  conditions 
(i)  and  (ii)  as  discussed  in  Chapter  1. 


32 


33 


In  the  case  of  a first  order  autogressive  model  with  yt  = l3o  + pyt-i+et  (|p|  < 1,  t = 
• • •,  — 1, 0, 1,  • • •),  Zellner  (1971)  derived  Jeffreys’  multivariate  prior.  Henceforth,  this 
prior  will  be  referred  to  as  Jeffreys’  prior.  This  prior  is  proportional  to  the  positive 
square  root  of  the  Fisher  information  matrix.  Zellner  showed  also  that  an  approxima- 
tion to  Jeffreys’  prior  leads  to  a special  case  of  a class  of  priors  proposed  in  Thornber 
(1967).  Jeffreys’  prior  possesses  the  important  property  of  being  invariant  under  one- 
to-one  reparameterization.  However,  as  we  will  find  in  this  chapter,  Jeffreys’  prior 
does  not  satisfy  either  criterion  (i)  or  criterion  (ii)  as  discussed  in  Chapter  1.  On  the 
other  hand,  the  two  group  reference  prior  of  Bernardo  (1979),  or  the  one-at-a-time 
reference  prior  of  Berger  and  Bernardo  (1992a)  are  satisfactory  both  under  criteria 
(i)  and  (ii)  when  p is  the  parameter  of  interest.  As  shown  in  Datta  and  M.  Ghosh 
(1996),  these  priors  are  also  invariant  under  one-to-one  reparameterization. 

The  outline  of  the  remaining  sections  is  as  follows.  In  Section  2,  we  find  Jeffreys’ 
prior,  and  also  the  different  reference  priors.  It  is  shown  in  this  section  that  the 
reference  priors  are  all  first  order  probability  matching  priors,  but  Jeffreys’  prior  is 
not.  The  propriety  of  the  posterior  both  under  Jeffreys’  prior  and  under  the  reference 
priors  is  proved  in  Section  3.  Also,  in  this  section,  some  simulation  study  is  undertaken 
for  comparing  the  frequentist  coverage  probabilities  of  the  different  Bayesian  credible 
intervals  having  the  same  posterior  coverage  probabilities.  It  is  found  that  the  two- 
and  three-  group  reference  priors  have  distinct  advantage  over  Jeffreys’  prior  in  terms 
of  matching  the  frequentist  coverage  probabilities. 

In  Section  4,  another  method  of  constructing  Bayesian  credible  intervals  based  on 
a certain  divergence  measure  is  proposed,  and  it  is  shown  via  a numerical  example 
that  the  interval  length  of  the  credible  interval  based  on  the  two-  or  three-  group 
reference  prior  with  equal  tail  probabilities  is  smaller  than  the  one  based  on  the 
Kullback-Leibler  divergence  measure  having  the  same  posterior  coverage  probability. 
Finally,  some  concluding  remarks  are  made  in  Section  5. 


34 


3.2  Development  of  Noninformative  Priors 


3.2.1  Fisher  Information  Matrix 


As  mentioned  earlier,  Jeffreys’  prior  is  the  positive  square  root  of  the  determinant 
of  the  Fisher  information  matrix.  The  derivation  of  the  reference  priors  also  stems 
from  the  same  matrix.  Thus,  the  first  task  is  to  derive  the  Fisher  information  matrix 
for  the  first  order  autoregressive  (AR(1))  model  given  by 

Yt  = xf/3  + pFt_i  + et  (t  = • • •,  -1, 0, 1,  • • •)•  (3.1) 

In  the  above  Xt{p  x 1)  are  the  design  vectors,  /3  is  the  regression  vector,  p G (—1, 1)  is 
the  first  order  autocorrelation  coefficient,  and  the  errors  are  iid  A^(0,  cr^).  Suppose 
the  available  data  consist  of  Fi,  We  write  F = (Yi,-  • ■,Yk)'^,X^  = {xi,---,Xk). 

Then,  it  can  be  shown  that  (Zellner,  1971)  Y ~ N{X^,a'^W),  where  W = {{wij)) 
and  Wij  = pl*“-’l/(l  — p^)(i,i  — 1,  ■ • • A:).  We  note  that  when  p—0,  this  model  reduces 
to  the  linear  regression  model  with  independent  errors.  For  the  given  AR(1)  model, 
the  pdf  of  Y is  given  by  (Zellner,  1971) 

= {2Try^a^W\~^  exp 

= {2nrHa^yHi  - p^y 


- {Y-X/3fW-\Y-X0) 


X 


exp 


{Y  - X/3fW-yY  - X^)  . 


(3.2) 


Then,  the  Fisher  information  matrix  J(p,  cr^,/3)  is  given  by 


35 


I = I{p,a\f3) 


(fc-3)(l-p2)+2 

P 

0-2(l-p2) 

P 

k 

0^ 

c'^il-p'^) 

2cr* 

0 0 ^X^W-'^X  , 


(3.3) 


It  is  convenient  to  find  an  orthogonal  reparameterization  (Cox  and  Reid,  1987)  of 
(p,  /3).  To  this  end,  let  6i  — p,62  — g{p,  cr^),  and  O3  = (3,  and  we  obtain  a solution 

for 


1 

dp 

0^ 

hifil 

0 

0^ 

1 

0 

0^ 

I = I{p,<t\0)  = 

0 

liL 

5<t^ 

0^ 

0 

3 62,62 

0^ 

pR. 

dp 

dcr^ 

0^ 

0 

0 

0 

0 

^03,03  . 

0 

0 

. (3.4) 


This  leads  to  the  equations 


(a) 


T (fc-3)(l-p^)  + 2 

l,apj  {i-p^f 


I 

^2,62  2a*' 


P 

(7^(1  — fP)  ’ 


From  (a)-(c),  one  gets 


^ 2pa^ 

dp  \dcr‘^ ) k{l  — p2) 


and  a solution  is  given  by  p = cr^(l  — p^)  . 

Thus,  an  orthogonal  transformation  is  given  by 


6>i  = p,  02  = <7^(1  - p^)  S 03  = 13. 


36 


With  this  orthogonal  transformation,  the  Fisher  information  matrix  reduces  to 
m = m,  «3)  = Diag  /.(«.,  »2)j  . (3.5) 

where  h{9i,92)  = 92~\l  - 9i^)~^X^W~^X. 

Thus,  Jeffreys’  prior  is  given  by 

TTj{0)  a (l-9ly^[k{l-9l)  + 29iy^^92^h^^'^{9i,92) 

= {l-9ly^[k- {k-2)9l]^^^9y^\l-9ly^\X^W-'^X\^^^.  (3.6) 


3.2.2  Reference  Priors 


In  view  of  the  form  of  the  information  matrix  as  given  in  (3.5),  when  9i  is  the 
parameter  of  interest,  by  taking  rectangular  compacts  for  9i,92,  and  0^,  it  follows 
from  Berger  and  Bernardo  (1992a)  or  Datta  and  M.  Ghosh  (1995)  that  the  two 
group  reference  prior  for  {^i,  (92,03)}  is  t^r2{0)  oa  92^(1  — 9l)~^(k{l  — 9j)  + 29l)^ 
while,  the  three  group  reference  prior  with  the  ordering  {9i,92,03}  or  {9i,03,92}  is 
7Tr3(0)  oc  92~^(1  — 9f)~^(k(l  — 9f)  + 29f)^.  In  spite  of  their  apparent  similarity,  as 
we  will  see,  these  two  reference  priors  can  provide  different  answers  in  actual  data 
analysis. 

3.2.3  Matching  Priors 

For  the  specific  first  order  autoregressive  model  p = 9i  is  the  parameter  of  interest, 
and  9i  is  orthogonal  to  (92,0s).  Hence,  by  Tibshirani  (1989),  the  class  of  first  order 
probability  matching  priors  is  characterized  by 


«2,  9a)  oc  (1  - 9j)-'(fc(l  - 9?)  + 29j)b(«2,  e^)■ 


(3.7) 


37 


where  g{-)  is  an  arbitrary  positive  function  of  O2  and  0s,  differentiable  in  its  arguments. 
We  note  that  both  the  two  and  three  group  reference  priors  are  first  order  matching 
priors,  but  Jeffreys’  prior  7Tj(0)  as  given  in  (3.6)  is  not  a first  order  probability 
matching  prior.  Thus,  while  the  two  and  three  group  reference  prior  meet  criterion 
(ii),  the  same  is  not  true  of  Jeffreys’  multivariate  prior. 

To  narrow  down  the  choice  of  the  first  order  matching  priors  (infinitely  many), 
we  consider  second  order  probability  matching  priors.  Since  63  is  the  regression 
coefficient,  it  seems  meaningful  to  consider  the  subclass  g (02,03)  oc  /i(^2)-  Since 
parametric  orthogonality  holds  and  ~ ^ 2),  following  Mukerjee 

and  M.  Ghosh  (1997),  a second  order  matching  prior  within  the  above  subclass  is 
found  as  a solution  of 


where  = r^{0)  = ((n)),  - Eg  {dlog  f(Y-  0)/dO^}\  and  Lu2  = E 

Theorem  5 There  does  not  exist  any  second  order  quantile  matching  prior  within 
the  subclass,  7t(^i,  62,  0s)  oc  (1  — 6l)~^(k{l  — Of)  + 20i)^h(^2)- 


Proof  of  Theorem  5.  With  the  parameterization  0 = (^i,02j^a)  from  (p,a'^,l3),  the 
log-likelihood  is  given  by 


log  f(Y;  01,62, 03)  = -^log^2-4-(l  - Ol)  ^(y  - X03)^W-\Y-X03)+constant. 

Z ZdU2 


Then,  after  algebraic  simplifications,  it  follows  that 


Slog/ 


202(1-01)^ 


r(Y-X0s)^A,(Y-X03), 


38 


where  A,  = 

Next  to  find  we  begin  with  two  identities 


-bi,i,i  + 3Lii_i  + Liii  — 0; 

(3.9) 

d/11  _ j j 

d0i  - 

(3.10) 

where  L„,,  - E [(S^)  | O]  and  L„,  = E | K]. 

Also,  on  simplification,  we  get 


^111  — 


4(k  - 1)01 

A;2(l-02)3 


(3k -(k- 2)0^). 


Solving  (3.9)  and  (3.10)  after  some  simplifications. 


n,i,i 


+ 3 


dL 


11 


d0i 


jfc2(l-02)3V  V ; i;  ^ 


'k-(k-2)0f 


J;2(1_^2)3  ' ' ' 1' 


(3.11) 


Hence,  since  t(0i,  $2,  &s)  oc  (1  — d,)  ^(fc(l  — 0,)  + 20,)2/i(02),  from  (3.5)  and 
(3.11),  it  follows  that 


^(/n-Lu,r) 


5^^(l-dJ)(fc-(*-2)(lJ)-|. 

(A:  — 1)2 


Also,  Lu2  = ^2 ' Ai  so  that  4(/n'/'Lii2/'2/i(^2))  = |4  [O2H02)]  and  hence 


39 


where 


S2(«i)  = [0^62)] . 


Thus,  there  does  not  exist  any  second  order  quantile  matching  prior  within  the  above 
subclass. 

3.2.4  Matching  Based  on  Highest  Posterior  Density  (HPD)  Regions 

Next  we  discuss  the  matching  through  the  HPD  region.  The  open  question  is 
whether  or  not  there  exists  the  HPD  matching  prior  within  the  subclass,  7t(0i,  62,  63)  oa 
(1  — — 6l)  + 29j)^h{92).  Since  9i  is  orthogonal  to  {92, 63),  in  the  given  AR 

(1)  model  from  (4.1),  p.l37  of  J.K.  Ghosh  and  Mukerjee  (1995),  it  follows  that  a prior 
7T  is  a HPD  matching  prior  if  and  only  if 


(3.12) 


After  some  heavy  algebra. 


where 


k 


{k-{k-  2)9l)  ^ [{k  -2){k  + 8)9l  - k{k  - 10)] , hi{92)  = h{92) 


k-l 


9i{0i)  = 


40 


»2(«i)  = \(1  - dir'ik  -(k-  2)el)K  /i2(fe)  = ^ [«2ft(»2)l . 

Thus,  there  does  not  exist  any  HPD  matching  prior  within  the  above  subclass. 


3.3  Propriety  of  the  Posterior  Distributions 


First  we  find  the  marginal  posterior  distribution  of  p under  the  two  group  reference 
prior.  Suppose  1^2)  •••)  are  iid  random  vectors  having  the  above  AR  (1)  model. 
The  likelihood  function  is  given  by 


iS)  oc  (<T^)  (1 


2\  ” 

p ) ^ exp 


2^2 


J2{Yi-X0fW-\Yi-Xl3) 


i=l 


(3.13) 


Writing  Y = {Yi,-  • Yn),  under  the  two  group  reference  prior,  the  joint  posterior 
density  is  given  by 


'^R2{p,(^^,l3  \Y)  oc  {a^y 

-i(nfc+3)^j  - p2)  2 + ^ 2p2)2 

X exp 

E (^i  - X0fW^HYi  - X0) 

(3.14) 


Writing 

0 = [x'^W-^Xy^ X^W~^Y  {Y  = n-^  Yi), 

i=l 

S{p)  = E (Vi  - X0fw-^(Yi  - X0). 

i=l 


Integration  with  respect  to  /3  and  (j2  yields  the  marginal  posterior  pdf  of  p as 


T^R2{P  I 1^)  oc  (1  - p^) 


Ox  - + — - 

,2\  2^2fc 


\k{l-  p^)  + 2p^)^S{p) 


-i(nfc-p+l)|  yT 


X^  W-^X 


-1  v-l  2 


(3.15) 


41 


The  following  theorem  proves  the  propriety  of  ttr2{p  | T"). 

Theorem  6 'KR^ip  | 1^)  *s  proper. 

Proof  of  Theorem  6.  Since  W~^  is  a symmetric  positive  definite  matrix,  denoting  its 
eigenvalues  by  Ai  • A*,  one  has 

W~^  < max{Xi,- • •,\k}  Ik  <tr{W~^)Ik 

= {2  + {k-2){l  + p^))Ik<2kh.  (3.16) 

Thus, 

< {2k)~^\X^X\~'^ 

^ S{p)  < 2k  {Yi  - X^fiYi  - x3)| . (3.17) 

Also,  note  that  k{l  — p^)  + 2p^  < k.  Hence,  'Kr2{p  \ Y)  < c(l  — \ 

where  in  the  above  and  in  what  follows,  c (>  0)  is  a generic  constant  which  may 
depend  on  Y and  X,  but  does  not  depend  on  p.  Hence,  the  marginal  posterior 
density  of  p under  the  two  group  reference  prior  is  integrable  on  (—1, 1).  The  result 
follows. 


Second,  we  consider  the  marginal  pdf  of  p under  the  three  group  reference  prior.  Here 
the  joint  posterior  density  is  given  by 


X «xp  - X!}fW~\Yi  - X^) 


(3.18) 


Integration  with  respect  to  ^ and  yields  the  marginal  posterior  pdf  of  p as 


42 


iT^ip  I y)  oc  (1  - - p‘‘)  + 2p")5SW-!<”‘-”>lX’'W'-'X|  j.  (3.19) 

As  in  Theorem  6,  it  can  be  checked  that 

7rn3{p\Y)<c{l-p^)^-\ 

The  propriety  of  nn3{p  | Y)  for  n > 1 immediately  follows. 


Finally,  we  consider  the  marginal  pdf  of  p under  Jeffreys’  prior.  From  (3.3),  we 
obtain  Jeffreys’  prior  as 

Kj(py,0)  <x  - p^)  + 2f^)kl  - (3.20) 

Thus,  under  the  Jeffreys’  prior,  the  joint  posterior  pdf  is  given  by 
7Tj{p,a^,l3  \Y)  oc  (a^) 


X exp 


1 


Y,{Yi-X/3fW-\Yi-X^) 


2(t2 


i=l 


(3.21) 


Integration  with  respect  to  /3  and  cr^  yields  the  marginal  posterior  pdf  of  p as 

TTj{p  I Y)  oc  (1  - - p^)  + 2p^YS{p)~^.  (3.22) 

As  before, 

7T^(p|  Y)<c(l-p2)^-\ 
and  the  propriety  of  7Tj(p  | Y)  follows. 


Next  we  compare  the  reference  priors  with  Jeffreys’  prior  for  the  AR  (1)  model. 
We  accomplish  this  by  calculating  the  frequentist  coverage  probability  of  the  posterior 


43 


Table  3.1:  Estimated  Tail  Probabilities  of  Posterior  Distributions  under  the  Two 
Group  Reference  Prior  {ttr2),  Three  Group  Reference  Prior  {ttrs)  and  Jeffreys’  Prior 
(ttj)  and  Different  Sample  Sizes 


n 

p = -.3 

p = .3 

P(.05;p,  a\j3) 

P(.05;p,  a\!3) 

T^R2 

T^R2 

T^Ri 

2 

.1076 

.0613 

.0336 

.1111 

.0681 

.0389 

5 

.0672 

.0517 

.0441 

.0657 

.0523 

.0453 

10 

.0575 

.0511 

.0475 

.0552 

.0488 

.0458 

20 

.0563 

.0531 

.0514 

.0521 

.0492 

.0476 

n 

p=-.3 

p = .3 

P(.95;p, 

P(.95;p,  cr^,/3) 

T^R2 

TTj 

T^R2 

2 

.8904 

.9315 

.9540 

.8912 

.9369 

.9643 

5 

.9354 

.9485 

.9544 

.9367 

.9508 

.9581 

10 

.9418 

.9490 

.9521 

.9376 

.9449 

.9485 

20 

.9455 

.9483 

.9506 

.9411 

.9436 

.9451 

tail  probability  of  p.  We  follow  the  technique  of  Sun  and  Ye  (1996)  for  numerical 
calculations  as  discussed  in  Chapter  2. 

Table  3.1  provides  the  estimated  tail  probabilities  of  the  posterior  distributions  of 
p under  two  group  reference  prior  three  group  reference  prior  {'Krz)  and  Jeffreys’ 
prior  (ttj)  when  the  frequentist  tail  probabilities  are  .05  and  .95.  The  computation  of 
these  numerical  values  was  based  on  the  10,000  samples  for  given  true  values  of  p when 
the  first,  second  and  the  third  rows  of  X are  (1,0,0),  (1,1,0)  and  (1,1,1),  0 = (1, 1, 1)^ 
and  a = 1.  For  the  cases  presented  in  Table  3.1,  the  two  group  reference  prior  (7Tr2) 
and  three  group  reference  prior  (ttrz)  are  found  to  meet  the  target  frequentist  coverage 
probabilities  more  accurately  than  Jeffreys’  prior,  especially  for  small  and  moderate 
sample  sizes.  We  have  also  done  simulations  with  other  values  of  p such  as  p=0  and 
.5,  and  the  conclusions  are  very  similar. 


44 


3.4  Divergence  Measures 


In  this  section,  we  introduce  another  criterion  for  constructing  credible  intervals. 
It  has  long  been  known  that  the  distance  or  divergence  between  two  distributions  is 
often  a suitable  basis  for  model  comparison.  One  such  criterion  is  the  Kullback-Leibler 
divergence  measure.  As  noted  in  Chapter  2,  K{fi  ; /2)  is,  in  general,  not  the  same  as 
K{f2  ■ fi)-  Jeffreys  proposed  the  symmetric  measure  J(/i  : /2)  = K{fi  : /2)  + i^(/2  : 
/i)  as  the  divergence  measure  between  two  distributions.  From  the  specific  problem, 
we  get  the  following  corollary  when  /i  is  the  first  order  autoregressive  model,  while 
/2  is  the  independence  model,  and  there  are  n iid  observations  from  these  models. 

Remark  5 


Also,  simple  calculations  show  that  A'(/i  : /2),  AT(/2  : fi),  and  J(/i  : /2)  are  all  convex 
functions  of  p with  minimum  at  p = 0.  Thus,  any  of  these  divergence  measures  less 
than  or  equal  to  some  constant  provides  an  interval  for  p.  We  choose  this  constant 
such  that  the  credible  interval  has  a prescribed  coverage  probability. 

We  now  generate  a dataset  to  compare  the  lengths  of  (i)  equal  tailed  credible 
interval,  (ii)  HPD  credible  interval  and  (iii)  the  interval  based  on  Jeffreys’  divergence 
measures  having  the  same  coverage  probability.  Each  data  point  is  generated  from 
a 15-dimensional  first  order  autoregressive  process  with  normal  errors,  and  with  a 
15  X 1 design  vector  given  in  Haavelmo  (1947,  p 118).  The  sample  size  is  5,  and 


K{h-.h)  = I log(l  - p2)  + ; 

K{f2-.fi)  = ^ [(A:  - 2)p^  - log(l  - p^)]  ; 


(3.23) 


(3.24) 


(3.25) 


45 


the  autocorrelation  coefficient  p = .5.  The  marginal  posterior  of  p under  the  three 
different  priors  are  given  in  Figure  3.1.  It  is  seen  that  all  the  three  posterior  distri- 
butions have  essentially  the  same  shape.  Thus,  it  suffices  to  consider  the  three  group 
reference  prior. 

Figure  3.1:  Posteriors  Based  on  Three  Noninformative  Priors 


Table  3.2  provides  90%  and  95%  credible  intervals  for  p using  the  procedures  men- 
tioned above.  We  note  that  the  HPD  interval  has  the  shortest  length,  the  equal 
tail  interval  is  longer,  while  the  interval  based  on  Jeffreys’  divergence  measure  is  the 
longest.  However,  the  length  of  the  equal  tail  interval  is  nearly  the  same  as  that  of 
the  HPD  interval.  Also,  because  of  the  probability  matching  property,  our  recom- 
mendation is  to  use  the  equal  tailed  interval. 


46 


Table  3.2:  Credible  Intervals  for  p Based  on  Different  Procedures 


procedure 

90% 

95% 

interval 

length 

interval 

length 

equal  tail 

(0.3010,  0.6520) 

0.3510 

(0.2660,  0.6850) 

0.4190 

HPD 

(0.3029,  0.6537) 

0.3508 

(0.2680,  0.6867) 

0.4187 

J{h  ■ /2) 

(-0.6123,  0.6123) 

1.2246 

(-0.6513,  0.6513) 

1.3026 

3.5  Concluding  Remarks 


In  this  chapter,  we  have  derived  two-  and  three-group  reference  priors  when  the 
parameter  of  interest  is  the  autocorrelation  coefficient  in  first  order  nonexplosive  au- 
toregressive models.  These  priors  turn  out  to  be  different  from  Jeffreys’  multivariate 
prior.  The  reference  priors  possess  good  frequentist  properties  in  that  the  cover- 
age probabilities  of  equal  tailed  credible  intervals  for  the  autocorrelation  coefficient 
match  their  frequentist  counterpart  better  than  Jeffreys’  multivariate  prior,  especially 
for  small  and  moderate  sample  sizes. 

Default  priors  are  routinely  being  used  by  the  Bayesians  (and  often  by  non- 
Bayesians  as  well)  for  the  analysis  of  real  life  data.  We  do  not  expect  that  there 
will  be  a single  default  prior  which  will  stand  out  well  above  others  on  every  single 
occasion.  Based  on  the  four  criteria  as  mentioned  in  the  introduction,  there  could 
be  different  default  priors  each  optimal  according  to  one  of  these  criteria.  On  many 
occasions,  however,  reference  priors  are  quite  satisfactory  both  according  to  criteria 
(i)  and  (ii).  Examples  are  found  in  Tibshirani  (1989),  Mukerjee  and  Dey  (1993)  and 
Datta  and  M.  Ghosh  (1995).  The  present  chapter  bears  another  testimony  of  the 
same. 

The  two-  and  three-group  reference  priors  are  both  satisfactory  according  to  the 
matching  criterion,  and  this  is  borne  out  by  our  simulation  results.  Thus  we  recom- 
mend the  use  of  either  the  two-group  or  the  three-group  reference  prior  for  the  given 
problem. 


CHAPTER  4 

FAMILIAL  DATA  MODELS 
4.1  Introduction 


In  the  analysis  of  familial  data  the  primary  aim  is  to  estimate  the  degree  of  resem- 
blance between  family  members  which  is  measured  by  a parent-child  correlation,  the 
interclass  correlation  coefficient,  and  correlation  between  siblings,  the  intraclass  cor- 
relation coefficient.  Several  estimates  of  these  correlations  have  been  proposed  in  the 
literature.  In  particular,  Rosner,  Donner,  and  Hennekens  (1977)  gave  the  maximum 
likelihood  estimates  when  the  sib  sizes  are  equal.  However,  when  the  sib  sizes  are  not 
equal,  Rosner  (1979)  proposed  an  algorithm  for  finding  the  maximum  likelihood  esti- 
mates which  involves  iterative  implementation  and  may  not  even  converge  for  some 
sets  of  data.  Mak  and  Ng  (1981)  used  the  linear  model  approach  of  Kempthorne 
and  Tandon  (1953)  to  obtain  the  maximum  likelihood  estimates.  However,  nothing 
is  known  about  the  convergence  of  the  procedure. 

Because  families  have  varying  number  of  offspring,  the  maximum  likelihood  es- 
timators of  correlations  are  difficult  to  compute.  To  avoid  this  difficulty,  several 
estimators  have  been  proposed.  For  example,  Srivastava  (1984)  transformed  the  data 
and  then  proposed  two  alternative  estimators.  Srivastava  and  Keen  (1988)  developed 
a noniterative  method  for  estimating  the  interclass  correlation  coefficient  which  is 
derived  from  the  technique  of  weighted  sum  of  squares.  Also,  Gleser  (1992)  provided 
the  formulas  for  the  maximum  likelihood  estimators  of  the  parameters  based  on  sam- 
ples from  each  individual  family,  and  then  combining  them  in  some  arbitrary  way 
over  the  different  families. 


47 


48 


In  this  chapter,  we  attempt  the  Bayesian  analysis  of  familial  data  in  the  case 
when  the  families  considered  have  equal  number  of  offspring.  To  this  end,  we  have 
used  certain  noninformative  priors  including  the  widely  used  Jeffreys’  prior  as  well 
as  the  different  reference  priors  of  Berger  and  Bernardo  (1989,  1992a,  1992b).  Also, 
probability  matching  priors  are  considered.  In  Section  2,  we  find  information  matrix 
and  the  Jeffreys’  prior,  and  also  the  different  reference  priors.  Also,  a simultaneously- 
marginally  probability  matching  prior  is  derived  which  is  the  same  as  the  five  group 
reference  prior. 

In  Section  3,  we  establish  propriety  of  posteriors  under  a general  class  of  priors 
which  includes  two-group  and  five-group  reference  priors  and  Jeffreys’  prior  under 
certain  conditions.  Also,  in  this  section,  some  simulation  is  undertaken  for  comparing 
reference  prior  with  Jeffreys’  prior.  The  Bayesian  procedure  is  implemented  by  Gibbs 
sampler. 


4.2  Development  of  Noninformative  Priors 


4.2.1  Fisher  Information  Matrix 

Suppose  that  there  is  a family  with  k offspring,  let  Y denote  the  measurement  on  the 
mother  and  X = {xi,X2,  ■ ■ -,Xk)  be  the  vector  of  measurements  on  the  k offspring. 
It  is  assumed  that 


( ^ \ 

( \ 

^2  1 T ^ 

Y 

~ N 

Pm 

Pma^m^a^k 

^ Pa^k  y 

^ Pma^m^a^k  Paa^j^k  T Psa'^k}  y 

Letting, 


Ell 

S12 

S21 

S22 

Pms^m^s^k 
Pms^m^s^k  {(^  Psa)^k  T Pas^k} 


(4.1) 


49 


the  joint  pdf  of  Y and  X for  the  given  familial  data  is  given  by 


/(?/)  A^ttm  I^si  ^mi  ^si  Pmsf  Pss) 

= (27t)  2 |sp2  exp  |--(?/ - (®  - )S“  {y  - fim,  - f^sik)  ) j 

= (27t)  ^ |S|  ^exp|  — — {y  ~ t^m)  ^H-2  4”  ^22  l(^  M«lfc) 

X exp  I (l/  p.jji)S]^]^.2Sx25j22  A*A;lfc)|  > 


where  Sll-2  — Z)ll  — Si2S22^S21  and  S22-1  — S22  ~ S2iSj^x^^i2  • 


In  this  case, 

|S|  = |Sn.2|  IS22I  = - p,,)"-'  (1  + (A;  - 1)  Pss  ~ kpl,)  ; 


S-i  _ 
11-2  — 


l + {k-  l)ps 


l + {k-  l)pss  - kpl^ 


- p..) 


— 

rms 


l + {k-  l)pss  - kp. 


|2 

ms 


Pms 


^11^2^125^22^-  _ _ n , n.  -l\  \^k 


<7m<ys  (1  + (fc  - 1)  - fcpD 


J-I.  • 


Note  that  in  order  that  all  covariance  matrices  in  (4.1)  be  positive  definite,  it  must 
be  assumed  that 


> 0, 


1 


Pss  1) 


Pms  Pss  ^ ^ 

1 pgg  k 


The  stronger  conditions  0 < pg^  < 1,  p^^  < pgg  are  imposed  by  Srivastava  and  Keen 
(1988). 


50 


On  simplification,  the  above  reduces  to 


X 

X 


/ iVi  Mm)  7 ^mi  *^8)  Pmsi  Pss) 

{y  - Pmf 


(27r(7^)  ^exp 


2al 


{2TTa]{l  - Pss))  " exp 


2l 


ELi  {Xj  - x) 

2o-?(l  - Pss)  _ 


2'Kal{\  + {k-  l)pss  - kpl,s) 
k 


exp 


2a2  (1  + (fe  - 1)  Pss  - kfP^s) 


/ - \ Pms  / 

[X  - Ps) [y  - Pr 


(4.2) 


Consider  the  following  reparameterization  of  {pm,  Ps,cTm,(^s,  Pms,  Pss)- 


/3  = 


Pms^s 


C^s(l  -Psa); 


= CTafl  + {k-  l)pss  - kpl,s]- 


(4.3) 


With  this  transformation,  the  joint  pdf  of  Y and  X reduces  to 


f{y,  ®)  Pm,  Ps,  P,  ^m,  0^1,  <72) 
{y  - Pmf 


(27T(7^) 


1/2 


exp 


1 

— 

(27r(7i)  ^ 


-\2- 


E {Xj  - a:) 

2a\ 


X 


I (27TCri) 


1/2 


exp 


2ci9 


{x-  Ps-  P{y-  Pm)Y 


(4.4) 


As  mentioned  earlier,  Jeffreys’  prior  is  the  positive  square  root  of  the  determinant  of 
the  Fisher  information  matrix.  The  derivation  of  the  reference  priors  also  stems  from 
the  same  matrix.  Thus,  we  find  the  per  unit  Fisher  information  matrix  under  the 
new  parameterization  9 = {pm,  Ps,  P , <^m,  (^2)  as 


I{9) 


n 0^ 

0 I* 


(4.5) 


where 


51 


T*  — 
■*2  — 


Thus,  Jeffrey’  prior  is  given  by 


nj(0) 


(4.6) 


Remark  6 Following  Bernardo  (1979),  if  the  parameter  of  interest  is  6 (no  nuisance 
parameter),  then  Jeffreys’  prior  is  a reference  prior. 

4.2.2  Reference  Priors 

In  view  of  the  form  of  the  information  matrix  as  given  in  (4.5),  by  taking  rectangu- 
lar compacts  for  {pm,  Ps,P,  cTi)  <72)  it  follows  from  Berger  and  Bernardo  (1992a)  or 
Datta  and  M.  Ghosh  (1995)  that  five  group  reference  prior  for  {{pm,  (^1,  (^2} 

is  7Tij5(0)  (X  while,  the  two  group  reference  prior  with  the  ordering 

(^1,(^2)}  or  {(y0,a„,(Ti,<T2),(/Xm,)Us)}  is  TrR2{0)  OC  (Tf  ^(7^^ 

Remark  7 Because  of  invariance  (Datta  and  M.  Ghosh,  1996  ; Mukerjee  and  M. 
Ghosh,  1997),  the  two  group  reference  prior  and  Jeffreys’  prior  in  {pm,  Ps,  (^m,  cts,  Pms,  Pss) 
parameterization  are  given  respectively  by 


7Tk2  oc  (7„V/(1-/9,s)  Ml  + 2(fc-l)ps,]  [i  + (A:-1)ps, -fcpy 
TTj  oc  pssy^[l  + 2{k-l)pss]\l  + {k-l)pss-kpl,,]  . (4.7) 


52 


4.2.3  Probability  Matching  Priors 

For  the  specific  familial  data,  the  parameter  of  interest  is  0 = {6i,-  ■ -,6e)  = 
(72).  We  find  a prior  which  satisfies  the  probability  matching 
criterion  separately  for  each  component  of  the  parameter  vector  0.  Such  a prior  is 
referred  to  as  a simultaneously-marginally-probability-matching  prior  for  the  different 
components  of  0.  From  Datta  and  J.K.  Ghosh  (1995),  such  a prior  tt{0)  for  the 
parametric  function  t{0)  = {ti{0),  • • -,ts(0))^  is  found  as  a solution  of 

E ^ = 0 i = l,  •••,«,  (4-8) 

1=1 

where 

„(«)  = = {v5(e)/-‘{e)v„(e)}“''h-‘(9)v,,(e)  j = 1,  ■ ■ -,s, 

and 

v,,(fl)  = (dti(0)ideu--  ;mj(e)/do,). 

For  the  specific  problem,  t{0)  = (0i,  • • •,  Oe)  = (/x^,  /Xg,  /3,  am,  cti,  <^2).  From  Datta  and 
J.K.  Ghosh  (1995),  one  has  the  following  theorem. 

Theorem  7 A simultaneously-marginally-probability-matching  prior  for  each  compo- 
nent of  0 is  given  by 

7t(0)  oc  (4.9) 

which  is  the  same  as  the  five  group  reference  prior  tt^s  . 

Proof  of  Theorem  7.  Let  tj{0)  = 9j,  j = 1,  • • • 6 and  7t{0)  oc  a^af^a^^ . 


53 


r/5(6>)  = (0,  0,  0,  0,  ai,0f/^2(A:-l); 

7/6(61)  = (0,  0,  0,  0,  0, 


and  thus,  (4.8)  holds  and  7t(0)  oc  ct“  V2  ^ is  a simultaneously-marginally-probability- 
matching  prior. 

On  the  other  hand,  if  one  would  like  to  make  a joint  statement  as  opposed  to 
marginal  statements,  a joint  treatment  of  these  parameters  seems  to  be  intuitively 
desirable.  In  particular,  for  Q here,  we  will  be  interested  in  obtaining  a prior  satis- 
fying the  matching  criterion  jointly.  Such  a prior  is  referred  as  a jointly-probability- 
matching  prior.  Following  Datta  (1996),  a simultaneously-marginally-probability- 
matching  prior  for  the  parametric  function  t{6)  is  jointly-probability-matching  if, 
and  only  if,  for  all  9 ~ {9i,  ■ ■ ■ 9p), 


"k  ^mjfc(^)  0 j,  k,TTl  1,  ' ' ‘j  5, 


(4.10) 


where 


^jkm{0)  = vj  (^)  VL(^)  i.  k,m  = l,---,s. 


For  the  specific  familial  data  problem,  t{6)  = {9i,  • • • 9e),  it  can  be  checked  that 


54 


P{9) 


A2  0^ 
0 I4 


where 


Ao  — 


/c2/5(t„((72  + A:/5V^)  2 


kl/3am{(rl  + A:/3V^) 


In  this  case, 


Cl23  — Vi^23{^)  — 0) 


^231  — V2  ^31  (^)  ““  0) 

612  = vl^ui^)  = (^l{(^2  + + 0- 


Remark  8 Because  of  Datta  (1996),  a simultaneously-marginally-probability-matching 
prior  for  0 is  not  a jointly-probability-matching. 


4.3  Propriety  of  the  Posterior  Distributions 


First  we  find  the  joint  posterior  distribution  of  {p  rrn  l^si  ^ si  Pms,Pss)  under  the 

five  group  reference  prior.  Suppose  (Fi,  Xi)^,  {Y2,  X2Y , •••,  (F'„,  Xnf  are  iid  random 
vectors  from  above  familial  data.  The  likelihood  function  is  given  by 


L{p  mi  psi  Pi  ^mi  ^1)  ^2) 


oc  {Om  exp 


1 


,e. 


y~!  {Vj  Pm) 


' ’ exp 


k n 


E E 

i=\  j=i 


02  exp 


E ((^i  - Ms)  - ^ (2/j  - Mm))^ 


X 


(4.11) 


55 


Writing  y = (Fi,  • • •,  y„)  and  X — (Xi,-  ■ X„),  under  the  five  group  reference 

prior,  the  joint  posterior  density  is  given  by 

f-^si  ^ 1 ^2  \ 2/)®) 


oc  ^exp 


2^^  ,=i 


{Vj  f^m) 


(Ti  ' exp 


k n 


E E 

i=i  j=i 


X < C^2  " ^ 


E ((^j  iVi  - l^m)Y 

^(^2  j=l 


(4.12) 


The  following  theorem  establishes  the  propriety  of  IJ^s,  P , (^m,  (^2  \ y,x). 


Theorem  8 ^s,  P,crm,cri,cr2  \ y,x)  is  proper. 


Proof  of  Theorem  8.  First  we  write 


E /^(yj  yrn)] 

j-1 


npi]  - E + E - ^ivj  - 

j=i  j=i 


iE  ^iVj  Mm)]  ^ T Mm)] 

j=l 


^r=i 


n 


1 I " 

E ~ ^(yj  ~ Mm)]  ^ • 


Integration  with  respect  to  pg,  the  joint  posterior  of  {pm,  /5,  crm,cn,cr2)  is  given  by 
P } ^m:  ^2  \ 2/)"®) 


56 


a ^exp 


1 " 

j=l 


— n(fc— 1)— 1 


exp 


E E 

i=l  i 


X 


(Tj  " exp 


A: 

2(t| 


E K'  - - ^^^)f 

j=i 


n 


n 2' 


E ^ /^m)) 

j-l 


. 


Now, 


E [^j  - ^(yj  - yrnf  - ^ 

j=l  n 


-t  2 


E ^ (%■  “ yrn)) 

j=l 


Y,  - 2/5  ^jivi  ~ yrn)  + E (%■  “ ~ “ f^(y  ~ y^)y 

j=i  j-i 


E (%■  - 2/)' 

j=i 


j=l 

2 r 

E"=i(^j  - ^)(%  - y) 

T.U  (»  - tf 

-\2 


E”.i  (w  - y) 


j=i 


Thus,  integrating  with  respect  to  j3,  the  joint  posterior  of  <7^,  (Ti,  <72)  is  obtained 
as 


^H5(A^mj  ^mj  •^1)  *^2  | 2/)*®) 


oc  ^ ^ exp 


X ^ (72  exp 


U 

k 


— n(fc— 1)— 1 

(7i  ^ ^ exp 


k n 


^ E E 

i=i  j=i 


Si. 


2(7q 


Q _ ~xy 


^yp/  J 


where  Sxx  — Sj=i  (^j  ’ ^yy  — ^j=i  (%  v)  > ^xy  ^){yj  y)- 


57 


Integration  with  respect  to  yields  the  joint  posterior  of  (<Tm,  <^1,  ct2)  as 

I Vt^') 


oc  { exp 


■'yy 


-n{k-l)  I ^i-1  ^j=l  (^ij  ^j) 


exp 


2cri 


X < (72  exp 


k _ g, ' 

n ) 


2ao 


'yy 


Note  now  that  cr^,  ai,  and  a2  have  independent  inverse  gamma  posteriors.  Hence,  the 
joint  posterior  density  of  {fXm,  fJ^s,  <^mi  <^1,  <^2)  under  the  five  group  reference  prior  is 
integrable.  The  result  follows. 


Second,  we  consider  the  joint  posterior  pdf  of  A*s,  /3,  (7m)  ctI)  <^2)  under  the  two 
group  reference  prior.  Here  the  joint  posterior  density  is  given  by 

^^R2{^^  m?  ^m)  C’’!)  U’2  I Vi  x) 


OC  <(7^  exp 


U 


{Vj  l^m) 


(7i  ^ ’ exp 


k n 


E E 


X < (72 " ^ exp 


A E ((^J  iVj  - 


(4.13) 


As  in  Theorem  8,  it  can  be  checked  that 

7Ti?2  ((7m>  (7l , (72  | UiX) 


(X 


( 

'l  I 

\ — fl+1 

^yy 

1 ) 

iUm  (i-X-P 

2<t^J 

n 

^ ' exp 


Eil  Y.U  ("^0  - 

2(7? 


X S (72  ” exp 


A-  5^ 

___/'(?  _ 

■ „ 2 Wxa:  ; 

Z(72  Oyy 


58 


The  propriety  of  iTR2{nm,  fJ^s,  (^1,(^2)  immediately  follows. 


Finally,  we  consider  the  joint  posterior  pdf  of  (/r^,  fXs,  13,  o’m,  <7i,  (72)  under  Jeffreys’ 
prior.  Under  the  Jeffreys’  prior,  the  joint  posterior  pdf  is  given  by 

fJ's,  0 , ^1,  ^2  I y> 


oc  ^exp 


J=l 


a 


— n(fc— 1)— 1 


exp 


k n 


E E 


X < (72  exp 


k ^ 

~T^~2  E ~ ~ ^ ~ Mm)) 

^^2  j=l 


(4.14) 


As  before. 


1 — n 

Q 

^yy 

1 I 

exp 

2<’mJ 

n 

— n(A:— 1)— 1 


exp 


Eti  £?.i  (%  - fj)‘ 

2(7? 


X < (72  exp 


A-  5^ 

__Z_(q  _ \ 

2ar  S,  ’ 


yy  J 


and  the  propriety  of  iTj{iJ,m,  fJ-s,  13,  o’m,  <7i,  (72  | y,  x)  follows. 


59 


4.4  Further  Priors 


We  note  that  the  suggested  simultaneously-marginally-probability-matching  prior 
for  d is  not  necessarily  a matching  priors  for  all  parameter  of  interest.  If  we  are 
interested  in  //m,  or  cr^,  then  the  suggested  simultaneously-marginally-probability- 
matching  prior  is  indeed  a first  order  probability  matching  prior.  This  is  not,  however, 
true  when  the  parameter  of  interest  is  cr^,  pms,  or  p^g.  Following  Datta  and  J.K.  Ghosh 
(1995),  we  develop  the  probability  matching  priors  when  the  parameters  of  interest 

3<r6  O’si  Pmsj  Pss’ 

First,  we  find  the  probability  matching  prior  when  our  parameter  of  interest  is  t{0)  = 
as-  Since 


(^s 


+ {k-  1)  al  + , 


(4.15) 


Vt(0) 


d(Tg  d(7s  da  a dag  dag  \ 

dpru  ’ dp,g  ’ dp  ’ dai  ’ da2  J 


1 T 

0,  kPa\,  kPal,  {k  — l)ai,  (72] 


and  rj{0) 


where 


I {0)Vt{0)  ^ al, 

^Vj{0)r\0)Vt{0)  ^ 


u=  (al  + {k-  l)al  + kpPal^ 
hi(/3,(T^,ai,o-2)  = \^(al  + kpPal^^  ^{k-l)a 


-1/2 


Following  Datta  and  J.K.  Ghosh  (1995),  the  probability  matching  equation  is  given 
by 

2al-^  iP  hMO))  + hM^))  + ^ ^ = 0, 


60 


which  has  a solution  given  by 

7t(0)  = h]'^|/3|“^((T^CriC72)“^ 


[(a|  + + (fc  - 1)  O'?]  ' 

\^\{(TmOl(r2f 


(4.16) 


Note  that  none  of  the  priors  derived  in  the  previous  section  is  a first  order  proba- 
bility matching  prior  if  our  parameter  of  interest  is  cr^.  Some  calculations  show  that 
posterior  derived  from  (4.16)  is  not  proper. 


Second,  we  develop  the  probability  matching  prior  if  we  are  interested  in  t{6)  = Pmsi 
the  interclass  correlation  coefficient.  As  the  previous  case,  it  can  be  checked  that 
since 


Pms  — 


Vk^Gm 

[a^  + {k  - l)al  + ’ 


(4.17) 


r/(0)  oc  h2  [0, 0,  ^ (a|  + (fc  - l)al) , ^ (a|  + (A:  - 1)<t2)  - 


iT 


h2  = [2al  (cr|  + {k  - l)crf)  {aj  + {k  - l)al  -h  kp'^al,)  + P{k-  l)^^a^cr^] 


Thus,  the  probability  matching  prior  is  given  by 

^ {<rl  + {k-  l)a^)  I (hMe))  - {<rlh2m) 

+-p  + {k  - l)o'i)  (^m^27r(0))  - (alh2'K{9)'^  = 0, 

which  has  a solution  given  by 


7t(0)  oc 


(4.18) 


61 


Hence,  when  our  parameter  of  interest  is  Pms,  the  first  order  matching  prior  is  given 
by 


-2-3-3 


2al 


(t|  + {k 


if^2  + {k  — l)^!  + kj3^a^^  + k^{k  — l)/3 


j2  2 4 


-1/2 


(4.19) 


Note  that  if  we  are  not  interested  in  pms,  then  neither  Jeffreys’  prior  nor  the  reference 
priors  are  first  order  matching.  It  can  be  checked  that  the  posterior  under  (4.19)  is 
proper. 


Finally,  we  consider  a situation  where  our  parameter  of  interest  is  pss,  the  intraclass 
correlation  coefficient.  As  before 


Pss 


0-2 


- al  + 

{k  — l)af  + k^'^a. 


(4.20) 


T]{0)  oc  (al  + kp^al^) 


0,  0,  2k^al, 


-iT 


(cr|  + kp^a^),  ka^ 


Letting  am,  02)  = {al  + k0^al^)  \ the  probability  matching  equation  is  given  by 
2kal^  {ph3TT{0))+k^P^-£-  (^m^37r(6>))-^^^  ((7i7r(6>))+fc^  [alhiri^])  = 0 


which  has  a solution  given  by 


7t(0)  = 9 ^\p\  ^ 


(4.21) 


Also,  if  our  parameter  of  interest  is  Psg,  none  of  the  priors  given  in  the  previous  section 
is  a first  order  probability  matching  prior.  It  can  be  shown  that  the  posterior  under 
(4.21)  is  not  proper. 


62 


4.5  Simulation  Study 


4.5.1  Method 


We  compare  in  this  section  two  group  {7Tr2)  and  five  group  reference  priors 
along  with  Jeffreys’  prior  (ttj).  We  accomplish  this  by  calculating  the  frequentist 
coverage  probability  of  the  posterior  tail  probabilities  of  each  component  of  the  pa- 
rameter vector  01  = {nm,  Pms,Pss)-  For  example,  we  consider  the  parameter 

Pms  which  is  often  of  great  interest  to  biological  and  medical  researchers. 

The  computing  work  is  accomplished  in  three  stages.  In  the  first  stage,  we  generate 
1,000  random  samples  of  size  n=20  (or)  50  from  the  familial  distributions.  The  second 
stage  consists  of  computation  of  posterior  a-quantile  of  the  parameter  for  each  of  1,000 
sets  of  random  samples  using  the  Gibbs  sampler.  In  the  third  stage,  we  compute  the 
coverage  probability. 

As  discussed  above,  the  Gibbs  sampler  is  used  to  compute  the  posterior  a-quantiles 
of  the  parameters  given  (l^i,  • • ■,Yn).  As  a first  step,  we  need  to  generate  random 
variables  from  the  marginal  posterior  distribution  of  each  component  of  the  parameter 
vector  0.  Such  distributions  are  analytically  intractable  and  requires  high-dimentional 
numerical  integration.  Instead,  we  adopt  Monte  Garlo  integration  and  use  Gibbs  sam- 
pling. Gibbs  sampling,  originally  introduced  by  Geman  and  Geman  (1984)  and  more 
recently  popularized  by  Gelfand  and  Smith  (1990),  is  a Markovian  updating  scheme 
that  requires  sampling  from  full  conditional  distributions.  In  implementing  the  Gibbs 
sampler,  we  follow  the  recommendation  of  Gelman  and  Rubin  (1992)  and  run  n(>  2) 
parallel  chains,  each  for  2d  iterations  with  starting  points  drawn  from  an  overdis- 
persed distribution.  But  to  diminish  the  effects  of  the  starting  distributions,  the  first 
d iterations  of  each  chain  are  discarded.  After  d iterations,  all  subsequent  iterates  are 
retained  for  finding  the  desired  posterior  distributions  as  well  as  for  monitoring  the 
convergence  of  the  Gibbs  sampler. 


63 


For  the  given  familial  data,  to  implement  and  monitor  the  convergence  of  the 
Gibbs  sampler,  we  consider  n=10  parallel  chains,  each  for  2d  = 400  iterations  with 
starting  point  drawn  from  an  overdispersed  distribution.  The  implementation  requires 
generation  of  samples  from  the  following  full  posterior  conditional  distributions. 


(I)  Full  conditionals  under  the  two  group  reference  prior: 


rest  ~ iV  [(^  + ' E"=1  (4  + - f^s))  , 

Hs\y,  rest  ~ N E"=i  {xj  - P {yj  - Mm)}  , 

P\y,  rest  ~ N [{Ej  {Vj  ~ Mm)^}  Ej  [Vj  ~ fJ-m)  {xj  - Ms) , {^  Ej  {Vj  ~ Mm)^} 

ally,  rest  - IG  [^,  | E^=i  {Vj  ~ Mm)^] 

ally,  rest  ~ IG  [^,  \ ELi  E^=i  {x^j  - Xjf 

ally,  rest  ~ IG  [^,  | E -=i  {{x-j  - Ms)  - /5  (%•  - Mm)}"' 


(II)  Full  conditionals  under  the  five  group  reference  prior: 


ymly,rest  - N E"=i  ^ - Ms))  > (^  + ^) 

Ms|y,  rest  ~ E”=i  - /?  (%•  - Mm)}  , ^ 

/3]y,  rest  ~ A/'  |{Ej  {Vj  ~ Mm)^}  Ej  (?/j  - Mm)  (^j  - Ms)  > Ej  (z/j  - Mm)^} 

rest  ~ IG  [f , I Ej=i  (z/j  - Mm)^ 
cj2|y,  rest  ~ IG  [^,  | E^i  EU  {xij  ~ x^f 
ally,  rest  ~ IG  [f , | E"=i  {{xj  - Ms)  - /^  (z/j  - Mm)}' 


(III)  Full  conditionals  under  the  Jeffreys’  prior: 


Mm|y,  rest  ~ iV  [(^  + ' E"=i  (ft  + ^Z/j  - ff  (^“j  - Ms))  , (ft  + ^) 

y^ly,  rest  ~ AT  [n"^  E"=i  {^Cj  - /3  (%•  - Mm)}  , 

Ply,  rest  ~ A/'  [{Ej  {Vj  ~ Mm)^}  Ej  (z/j  - Mm)  (^j  “ Ms) , Ej  (z/j  - Mm)^} 


64 


Table  4.1;  Estimated  Frequentist  Coverage  Probability  of  the  Posterior  Tail  Proba- 
bilities of  Each  Component  of  the  Parameter  Vector  0i,  When  pms=0-^  and  pss=0.9 


n = 20 

n = 50 

'^R2 

TTj 

T^R2 

T^Rh 

TTj 

Pm 

0.958 

0.957 

0.950 

0.940 

0.955 

0.948 

Pa 

0.964 

0.963 

0.977 

0.957 

0.950 

0.969 

0.957 

0.955 

0.945 

0.942 

0.951 

0.942 

CTa 

0.919 

0.936 

0.905 

0.956 

0.951 

0.941 

Pma 

0.954 

0.961 

0.962 

0.959 

0.949 

0.954 

Paa 

0.937 

0.935 

0.925 

0.957 

0.946 

0.946 

am\y,rest  ~ IG 

[5,  1 T.U  (Vi  - 

a\\y,  rest  ~ IG 

1 Eti  EU 

ally,  rest  ~ IG 

- Pm)V 

As  a second  step,  we  obtain  the  random  samples  from  the  marginal  posterior  dis- 
tribution of  each  component  of  the  parameter  vector  Gi  = (p  m)  ^mi  Pmsi  Psa) 

using  (4.15),  (4.17),  and  (4.21). 

Finally,  we  compute  the  posterior  a quantile  of  each  component  of  (p  mj  Pai  ^mi  ^ai  Pmai  Paa) 
from  the  resulting  histograms. 


4.5.2  Results 


We  generate  random  samples  of  sizes  n =20,  and  50  from  the  familial  distribution. 
Throughout,  we  take  pm  = Ps  = 0 and  am  = (^a  = 1-0,  but  take  different  values  of 
{Pma,Paa)  = {(0.3, 0.7),  (0.1, 0.9),  (0.5, 0.5)}.  The  results  in  the  tables  are  shown  for 
three  different  values  of  {pma,  Paa)-  Although  none  of  the  priors  are  first  order  matching 
in  the  original  parameterization,  the  two  group  reference  prior  is  slightly  at  edge  over 
the  others  in  terms  of  the  coverage  probability. 


65 


Table  4.2:  Estimated  Frequentist  Coverage  Probability  of  the  Posterior  Tail  Proba- 
bilities of  Each  Component  of  the  Parameter  Vector  0i,  When  Pms=Q-^  and  Pas=0.7 


n = 20 

n = 50 

T^R2 

TTfiS 

TTj 

T^R2 

T^Rb 

TTj 

0.906 

0.917 

0.908 

0.884 

0.864 

0.883 

0.951 

0.953 

0.959 

0.941 

0.958 

0.961 

0.932 

0.959 

0.951 

0.940 

0.945 

0.956 

Os 

0.938 

0.938 

0.915 

0.942 

0.935 

0.924 

Pms 

0.959 

0.949 

0.958 

0.955 

0.945 

0.955 

Pss 

0.950 

0.956 

0.928 

0.955 

0.939 

0.949 

Table  4.3:  Estimated  Frequentist  Coverage  Probability  of  the  Posterior  Tail  Proba- 
bilities of  Each  Component  of  the  Parameter  Vector  0i,  When  p^s=0.5  and  Psj=0.5 


n = 20 

n = 50 

T^Rb 

T^R2 

TTj 

Pm 

0.805 

0.833 

0.811 

0.807 

0.778 

0.789 

Ps 

0.891 

0.896 

0.908 

0.909 

0.864 

0.900 

Om 

0.940 

0.926 

0.935 

0.955 

0.958 

0.958 

Os 

0.945 

0.936 

0.926 

0.942 

0.950 

0.944 

Pms 

0.945 

0.929 

0.960 

0.955 

0.967 

0.957 

Pss 

0.959 

0.940 

0.934 

0.948 

0.963 

0.948 

66 


4.6  Concluding  Remarks 

In  this  chapter,  we  have  developed  noninformative  priors  for  the  familial  data 
when  families  have  the  same  number  of  offspring.  Two-  and  five-group  reference 
priors  have  been  derived  along  with  Jeffreys’  prior.  A five  group  reference  prior  is 
derived  which  is  the  same  as  a simultaneously-marginally-probability  matching  prior, 
but  is  not  a jointly-probability-matching  prior. 


CHAPTER  5 

SUMMARY  AND  FUTURE  RESEARCH 
5.1  Summary 


In  this  dissertation,  we  have  derived  probability  matching  priors  and  reference 
priors  along  with  Jeffreys’  priors  for  the  intraclass  model,  first  order  autoregressive 
model,  and  familial  data  model.  Orthogonal  transformation  of  parameters  is  found  to 
facilitate  the  derivations.  The  propriety  of  the  posteriors  derived  from  the  different 
priors  is  discussed  and  explicit  posterior  distributions  of  the  parameter  of  interest  are 
derived  for  both  the  intraclass  model  and  the  first  order  autoregressive  model. 

For  the  intraclass  model,  the  one-at-a-time  reference  prior  turns  out  to  be  a second 
order  matching  prior,  and  it  is  different  from  Jeffreys’  prior.  The  latter  is  not  a second 
order  matching  prior  for  the  intraclass  model.  For  the  first  order  autoregressive  model, 
it  has  been  shown  that  the  reference  priors  are  all  first  order  probability  matching 
priors,  but  Jeffreys’  prior  is  not.  For  the  familial  data,  a five  group  reference  prior  is 
derived  which  is  the  same  as  a simultaneously-marginally-probability  matching  prior, 
but  is  not  a jointly-probability-matching  prior.  Computer  simulations  illustrate  that 
even  for  the  small  samples,  reference  priors  have  a distinct  advantage  over  Jeffreys’ 
prior  in  terms  of  matching  the  target  coverage  probabilities  in  a frequentist  sense. 

For  both  the  intraclass  model  and  the  first  order  autoregressive  model,  we  consider 
examples  to  compare  the  Bayesian  credible  intervals  of  the  parameter  of  interest  based 
on  (i)  equal-tail,  (ii)  HPD,  and  (iii)  divergence  measures  having  the  same  coverage 
probability.  The  general  power  divergence  measure  includes  the  Kullback-Leibler 
and  Hellinger  divergence  measures.  It  is  shown  that  HPD  intervals  have  the  shortest 


67 


68 


length  in  both  cases.  However,  for  the  first  order  autoregressive  model,  length  of  the 
equal  tail  interval  is  nearly  the  same  as  that  of  the  HPD  interval. 


69 


5.2  Ideas  for  Future  Research 


Some  ideas  for  future  research  are  as  follows  ; 

1.  Develop  the  Bayesian  analysis  for  general  time  series  data. 

We  found  the  reference  priors  along  with  Jeffreys’  prior  when  the  parameter 
of  interest  is  the  autocorrelation  coefficient  in  first  order  nonexplosive  autore- 
gressive models.  Most  studies  in  the  analysis  of  time  series  data  are  based  on 
frequentist  approaches.  It  is  interesting  to  study  this  problem  in  a Bayesian  way 
using  noninformative  priors.  For  example,  we  could  extend  Bayesian  inference 
to  higher  order  autoregressive  models.  Also,  we  plan  to  provide  the  Bayesian 
analysis  for  various  time  series  models  such  as  ARMA  and  ARIMA  models. 

2.  Extend  Bayesian  analysis  to  different  number  of  offsprings  for  the 
familial  data. 

We  developed  the  Bayesian  analysis  for  the  familial  data  when  there  are  the 
same  number  of  offsprings  per  family.  We  plan  to  extend  when  different  families 
have  different  numbers  of  offsprings. 

3.  Provide  the  Bayesian  inference  for  the  general  mixed  model. 

Mixed  models  are  widely  used  in  statistics,  and  have  diverse  applications  such 
as  in  genetics,  dairy  sciences,  and  economics.  The  general  linear  mixed  model 
is  given  by 

Y — X/3  d-  Zu  + €, 

where  Y is  a k x 1 random  vector,  X and  Z are  given  matrices,  /3  is  a p x 1 
vector,  u and  e are  independently  distributed  such  that  u ~ MNk{0,'Ek)  and 
€ ~ MNk{0,alIk). 


70 


Consider  the  special  case  when  Ik-  In  this  special  case,  we  consider 

testing  Ho  : = 0.  This  testing  problem  can  be  viewed  as  a model  selection 

between  the  mixed  model  and  the  fixed  effect  model.  Thus,  our  parameter 
of  interest  could  be  We  plan  to  find  Jeffreys’,  reference,  and  probability 
matching  priors  for  the  mixed  model.  Propriety  results  for  the  posteriors  under 
these  noninformative  priors  should  also  be  investigated. 


REFERENCES 


[IjBerger,  J.  (1985).  Statistical  Decision  Theory  and  Bayesian  Analysis  (2nd  edition), 
Springer- Verlag,  New  York. 

[2] Berger,  J.,  and  Bernardo,  J.M.  (1989).  “Estimating  a product  of  means:  Bayesian 

analysis  with  reference  priors,”  Journal  of  the  American  Statistical  Association, 
84,  200-207. 

[3] Berger,  J.,  and  Bernardo,  J.M.  (1992a).  “On  the  development  of  reference  priors” 

(with  discussion),  Bayesian  Statistics,  Oxford  University  Press,  35-60. 

[4] Berger,  J.,  and  Bernardo,  J.M.  (1992b).  “Reference  priors  in  a variance  components 

problem,”  Bayesian  Analysis  in  Statistics  and  Econometrics,  Springer- Verlag, 
New  York,  177-194. 

[5] Bernardo,  J.M.  (1979).  “Reference  posterior  distributions  for  Bayesian  inference” 

(with  discussion).  Journal  of  the  Royal  Statistical  Society,  Ser.  B,  41,  113-147. 

[6] Box,  G.E.P.,  and  Jenkins,  G.M.  (1970).  Time  Series  Analysis:  Forecasting  and 

Control,  Holden-Day,  San  Francisco. 

[7] Brillinger,  D.R.  (1981).  Time  series:  Data  Analysis  and  Theory,  Holt,  Rinehart 

and  Winston,  New  York. 

[8] Broemeling,  L.D.  (1985).  Bayesian  Analysis  of  Linear  Models,  Marcel  Dekker,  New 

York. 

[9] Chow,  G.C.  (1975).  “Multiperiod  predictions  from  stochastic  difference  equations 

by  Bayesian  methods,”  Studies  in  Bayesian  Econometrics  and  Statistics,  ch.  8. 

[10]Cox,  D.R.,  and  Reid,  N.  (1987).  “Parameter  orthogonality  and  approximate  condi- 
tional inference”  (with  discussion).  Journal  of  the  Royal  Statistical  Society,  Ser. 
B,  49,  1-39. 

[lljCressie,  N.  and  Read,  T.  (1984).  “Multinomial  goodness-of-fit  tests,”  Journal  of 
the  Royal  Statistical  Society,  Ser.  B,  46,  440-464. 

[12] Datta,  G.S.  (1996).  “On  priors  providing  frequentist  validity  of  Bayesian  inference 

for  multiple  parametric  functions,”  Biometrika,  83,  289-298. 

[13] Datta,  G.S.,  and  Ghosh,  J.K.  (1995).  “On  priors  providing  frequentist  validity  for 

Bayesian  inference,”  Biometrika,  82,  37-45. 


71 


72 


[14] Datta,  G.S.,  and  Ghosh,  M.  (1995).  “Some  remarks  on  noninformative  priors,” 

Journal  of  the  American  Statistical  Association,  90,  1357-1363. 

[15] Datta,  G.S.,  and  Ghosh,  M.  (1996).  “On  the  invariance  of  noninformative  priors,” 

The  Annals  of  Statistics,  24,  141-159. 

[16] Diaz,  J.  and  Farah,  J.  (1982).  “Bayesian  identification  of  autoregressive  processes,” 

22nd  NBER-NSF  Seminar  on  Bayesian  Inference  in  Econometrics. 

[17] DiGiccio,  T.J.,  and  Stern,  S.E.  (1994).  “Frequentist  and  Bayesian  Bartlett  correc- 

tion of  test  statistics  based  on  adjusted  profile  likelihoods,”  Journal  of  the  Royal 
Statistical  Society,  Ser.  B,  56,  397-408. 

[18] Durbin,  J.  (1980).  “Approximations  for  densities  of  sufficient  estimators,” 

Biometrika,  67,  311-333. 

[19] Efron,  B.  (1986).  “Why  isn’t  everyone  a Bayesian”  (with  discussion).  The  American 

Statistician,  40,  1-11. 

[20] Fuller,  W.A.  (1976).  Introduction  to  Statistical  Time  Series,  John  Wiley  and  Sons, 

Inc.,  New  York. 

[21] Gelfand,  A.E.,  and  Smith,  A.F.M.  (1990).  “Sampling-based  approaches  to  calcu- 

lating marginal  densities,”  Journal  of  the  American  Statistical  Association,  85, 
398-409. 

[22] Gelman,  A.E.,  and  Rubin,  D.  (1992).  “Inference  from  iterative  simulation”  (with 

discussion).  Statistical  Science,  7,  457-511. 

[23] Geman,  S.,  and  Geman,  D.  (1984).  “Stochastic  relaxation,  Gibbs  distributions, 

and  Bayesian  restoration  of  images,”  IEEE  Transactions  on  Pattern  Analysis 
and  Machine  Intelligence,  6,  721-741. 

[24] Ghosh,  J.K.  (1994).  “Higher  order  asymptotics,”  NSF-CBMS  Regional  Conference 

Series  in  Probability  and  Statistics,  4,  86. 

[25] Ghosh,  J.K.,  and  Mukerjee,  R.  (1991).  “Gharacterization  of  priors  under  which 

Bayesian  and  frequentist  Bartlett  corrections  are  equivalent  in  the  multiparam- 
eter case,”  Journal  of  the  Multivariate  Analysis,  38,  385-393. 

[26] Ghosh,  J.K.,  and  Mukerjee,  R.  (1995).  “Frequentist  validity  of  highest  posterior 

density  regions  in  the  presence  of  nuisance  parameters,”  Statistics  and  Decisions, 
13,  131-138. 

[27] Gleser,  L.J.  (1992).  “A  note  on  the  analysis  of  familial  data,”  Biometrika,  1992, 

79,  412-415. 

[28] Haavelmo,  T.  (1947).  “Methods  of  measuring  the  marginal  propensity  to  consume,” 

Journal  of  the  American  Statistical  Association,  42,  105-122. 


73 


[29] Jeffreys,  H.  (1961).  Theory  of  Probability,  3rd  edition,  Oxford:  Clarendon  Press. 

[30] Kempthorne,  O.  and  Tandon.  O.B.  (1953).  “The  estimation  of  heritability  by  re- 

gression of  offspring  on  parent,”  Biometrics,  9,  90-100. 

[31] Lahiff,  M.  (1980).  “Time  series  forecasting  with  informative  prior  distribution,” 

Technical  Report  No.  Ill,  Department  of  Statistics,  University  of  Chicago. 

[32] Laplace,  P.S.  (1812).  Theorie  Analytique  des  Probabilities,  Courcier,  Paris. 

[33] Lee,  C.B.  (1989).  “Comparisons  of  frequentist  coverage  probability  and  Bayesian 

posterior  coverage  probability  and  applications,”  Ph.D.  Thesis,  Purdue  Univer- 
sity. 

[34] Lindley,  D.V.  (1958).  “Fiducial  distributions  and  Bayes’  theorem,”  Journal  of  the 

Royal  Statistical  Society,  Ser.  B,  20,  102-107. 

[35] Litterman,  R.B.  (1980).  “A  Bayesian  procedure  for  forecasting  with  vector  autore- 

gressions,” manuscript.  Department  of  Economics,  MIT. 

[36] Mak,  T.K.  and  Ng,  K.W.  (1981).  “Analysis  of  familial  data:  Linear-model  ap- 

proach,” Biometrika,  68,  457-461. 

[37] Mukerjee,  R.  and  Dey,  D.K.  (1993).  “Frequentist  validity  of  posterior  quantiles  in 

the  presence  of  a nuisance  parameter:  High-order  asymptotics,”  Biometrika,  80, 
499-505. 

[38] Mukerjee,  R.  and  Ghosh,  M.  (1997).  “Second  order  probability  matching  priors,” 

Biometrika,  84,  970-975. 

[39] Peers,  H.W.  (1965).  “On  confidence  points  and  Bayesian  probability  points  in  the 

case  of  several  parameters,”  Journal  of  the  Royal  Statistical  Society,  Ser.  B,  27, 
9-16. 

[40] Rosner,  B.  (1979).  “Maximum  likelihood  estimation  of  interclass  correlation,” 

Biometrika,  66,  533-538. 

[41] Rosner,  B.,  Donner,  A.  and  Hennekens,  C.H.  (1977).  “Estimation  of  interclass 

correlation  from  familial  data,”  Applied  Statistics,  26,  179-187. 

[42] Savage,  L.J.  (1961).  “The  subjective  basis  of  statistical  practice,”  Technical  Report, 

Department  of  Statistics,  University  of  Michigan,  Ann  Arbor. 

[43] Searle,  S.R.  (1971).  Linear  Models,  Wiley,  New  York. 

[44] Searle,  S.R.,  Casella,  G.,  and  McCulloch,  C.E.  (1992).  Variance  Components,  Wi- 

ley, New  York. 

[45] Severini,  T.A.  (1991).  “On  the  relationship  between  Bayesian  and  non-Bayesian 

interval  estimates,”  Journal  of  the  Royal  Statistical  Society,  Ser.  B,  53,  611-618. 


74 


[46] Srivastava,  M.S.  (1984).  “Estimation  of  interclass  correlations  in  familial  data,” 

Biometrika,  71,  177-185. 

[47] Srivastava,  M.S.  and  Keen,  K.J.  (1988).  “Estimation  of  the  interclass  correlation 

coefficient,”  Biometrika,  75,  731-739. 

[48] Sun,  D.  and  Ye,  K.  (1996).  “Frequentist  validity  of  posterior  quantiles  for  a two 

parameter  exponential  family,”  Biometrika,  83,  55-65. 

[49] Thornber,  H.,  (1967).  “Finite  sample  Monte  Carlo  studies:  An  antoregressive  il- 

lustration,” Journal  of  the  American  Statistical  Association,  62,  801-818. 

[50] Tibshirani,  R.  (1989).  “Noninformative  priors  for  one  parameter  of  many,” 

Biometrika,  76,  604-608. 

[51]  Welch,  B.  and  Peers,  H.W.  (1963).  “On  formulae  for  confidence  points  based  on 

integrals  of  weighted  likelihoods,”  Journal  of  the  Royal  Statistical  Society,  Ser. 
B,  25,  318-329. 

[52]  Yin,  M.  (1998).  “Asymptotic  expansion  for  posterior  probability  in  regression 

model,”  Statistics  and  Decisions,  16,  349-368. 

[53]  Yin,  M.  and  Ghosh,  M.  (1997).  “A  note  on  the  probability  difference  between 

matching  priors  based  on  posterior  quantiles  and  on  inversion  of  conditional 
likelihood  ratio  statistics,”  Calcutta  Statistical  Association  Bulletin,  47,  59-65. 

[54] Zellner,  A.  (1971).  An  Introduction  to  Bayesian  Inference  in  Econometrics,  John 

Wiley  and  Sons,  Inc.,  New  York. 

[55] Zellner,  A.  and  Tiao,  G.C.  (1964).  “Bayesian  analysis  of  the  regression  model  with 

autocorrelated  errors,”  Journal  of  the  American  Statistical  Association,  59,  763- 
778. 


BIOGRAPHICAL  SKETCH 


Jungeun  Heo  was  born  in  Kuryongpo,  Kyungbuk,  Korea,  on  November  16,  1969. 
In  1993,  she  received  her  Bachelor  of  Science  degree  in  statistics  at  Yeungnam  Univer- 
sity and  entered  the  masters  program  with  a full  scholarship.  In  1995,  she  obtained  a 
Master  of  Science  degree  in  statistics  at  Yeungnam  University.  She  then  worked  as  a 
part-time  lecturer  both  in  the  Department  of  Statistics  at  Yeungnam  University  and 
in  the  Department  of  Economics  at  Sangju  University  until  1996.  In  August  of  1996, 
she  began  her  Ph.D.  program  in  the  Department  of  Statistics  at  the  University  of 
Florida.  In  April  of  2000,  she  received  the  Outstanding  Academic  Award.  She  enjoys 
singing,  sports,  and  travel. 


75 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Malay  Ghosh, 

Distinguished  Professor  of  Statistics 

I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fijUy~a4equate,  in  spcJpfe  and  quality, 
as  a dissertation  for  the  degree  of  Doctor  of  PbflosopJJv.  / ^ 


Ramon  Littell 
Professor  of  Statistics 

I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Andrew  Rosalsky 
Professor  of  Statistics 

I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a dissertation  for  the  degree  of  Doctor  of  Philo^phyj- 


obett 

ciate  Professor  of  Statistics 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a dissertation  for  the  degree  of  Doctor  of  Philosophy. 

\V-Cui2_ 


Irene  Hueter 

Assistant  Professor  of  Mathematics 


This  dissertation  was  submitted  to  the  Graduate  Faculty  of  the  Department  of 
Statistics  in  the  College  of  Liberal  Arts  and  Sciences  and  to  the  Graduate  School  and 
was  accepted  as  partial  fulfillment  of  the  requirements  for  the  degree  of  Doctor  of 
Philosophy. 

August  2000  

Dean,  Graduate  School 


